masirjafri

Understanding Monorepos with Turborepo (Hands-On Guide)

Masir Jafri — Thu, 25 Dec 2025 08:48:08 GMT

As the name suggests, a monorepo is a single repository (for example, one GitHub repo) that contains all parts of your application. This includes frontend code, backend services, DevOps scripts, shared libraries, configs, and anything else related to the project. Instead of splitting things across multiple repositories, everything lives in one place.

Some popular projects that use monorepos are:

https://github.com/resend/react-email

https://github.com/calcom/cal.com/tree/main

These are real production-grade open-source examples where multiple apps and packages coexist inside one repository.

Why Not Just Use Simple Folders?

Now a common question that might arrive is:

“Why do I even need a monorepo? Why can’t I just create folders like /frontend, /backend, and /devops inside one repository?”

The answer is you absolutely can, and in some cases, you actually should.

If your services are highly decoupled, don’t share code, and don’t depend on each other at all, simple folders (or even separate repositories) work perfectly fine. For example, if you have a Golang backend service and a completely independent JavaScript service, there’s no strong reason to force them into a monorepo with shared tooling.

Monorepos start making sense when multiple parts of your system evolve together and share code or configurations.

Why Monorepos Are Useful

One of the biggest advantages of a monorepo is shared code reuse. You can create shared UI components, common utility functions, shared TypeScript types, or validation logic in one place and use them across multiple applications. This avoids duplication and prevents version mismatch issues.

Another big benefit is collaboration. When frontend and backend live in the same repository, changes that span multiple services become easier to manage. A single pull request can update both sides, and everyone has visibility into the full system instead of just one isolated part.

Monorepos also shine when it comes to build optimization and CI/CD performance. Tools like Turborepo can cache build outputs and skip unnecessary work if nothing has changed. This drastically reduces build and test times, especially in large projects and CI pipelines.

Finally, centralized tooling make things simpler. Instead of maintaining separate ESLint configs, TypeScript configs, and formatting rules for every project, you can define them once and reuse them everywhere. This leads to more consistency and less maintenance overhead.

Common Monorepo Tools in the Node.js Ecosystem

There are multiple tools available to manage monorepos in Node.js. Lerna and Nx are full-fledged monorepo frameworks that handle dependency management and workspace configuration. Yarn/npm-workspaces provide a simpler approach focused on dependency linking.

Turborepo, which we’ll focus on, is slightly different. It is not exactly a monorepo framework. Instead, it acts as a build system orchestrator, and that distinction is important.

A Bit of Turborepo History

Turborepo was created by Jared Palmer and later acquired by Vercel in December 2021. Since then, it has become one of the most widely used tools for managing builds in large JavaScript and TypeScript monorepos. Vercel itself uses Turborepo heavily across its internal projects

Build System vs Build System Orchestrator vs Monorepo Framework

A build system is the tool that actually transforms your code. In JavaScript and TypeScript projects, this includes compiling TypeScript, bundling files, minifying output, running tests, and so on. Examples include tsc, vite, webpack, and esbuild.

A build system orchestrator, which is where Turborepo fits, does not perform these tasks itself. Instead, it coordinates them. Turborepo lets you define tasks like build, dev, or lint, and then runs the appropriate commands in each package using the actual build tools you choose.

A monorepo framework focuses more on workspace structure, dependency relationships, and package management. Nx and Lerna fall into this category.

What Makes Turborepo Powerful

The biggest reason Turborepo is fast is caching. If you run a build and then run it again without changing any relevant files, Turborepo simply reuses the cached output instead of rebuilding everything from scratch. This is especially useful in CI environments.

Turborepo also supports parallel execution, meaning independent tasks can run at the same time instead of sequentially. This makes much better use of your machine’s resources and significantly reduces total build time.

On top of that, Turborepo understands the dependency graph of your monorepo. It knows which packages depend on which others and ensures tasks run in the correct order.

Initializing a Turborepo

To create a new Turborepo, you can simply run:

npx create-turbo@latest

During setup, select npm workspaces as the monorepo framework. Once initialization is complete, you’ll see a structured folder layout with apps and packages already wired together.

Understanding the Folder Structure

By default, the generated project contains two main sections: apps and packages.

The apps folder contains end-user applications such as websites and backend services. For example, apps/web is a Next.js website, and apps/docs is a documentation site.

The packages folder contains helper and shared modules. This includes UI components, shared TypeScript configuration, and ESLint configuration. These packages are meant to be reused across multiple apps.

Running the Project

From the root folder, running:

npm run dev

starts all development servers at once. You’ll notice multiple applications running on different ports, such as localhost:3000 and localhost:3001.

This demonstrates how multiple apps can live in one repo while sharing common code.

Exploring root package.json

This represents what command runs when you run

npm run build
npm run dev
npm run lint

turbo build goes into all packages and apps and runs npm run build inside them (provided they have it) . Same for dev and lint

Exploring packages/ui

package.json

src/button.tsx

Exploring apps/web

Dependencies

It is a simple next.js app. But it uses some UI components from the packages/ui module

package.json

If you explore package.json of apps/web, you will notice @repo/ui as a dependency

page.tsx

The packages/ui module exports reusable UI components. These components are imported into applications like apps/web and apps/docs using a workspace dependency such as @repo/ui.

When you import a component like Button into a Next.js page, it behaves just like a normal local dependency, even though it lives in a different folder.

Adding a New Page Using Shared UI

To add a new /admin page in the web app, you can create an Admin component inside packages/ui, export it, and then use it inside apps/web/app/admin/page.tsx.

Add Admin.tsx in packages/ui/src and write the code :

"use client";

export const Admin = () => {
  return (
    <h1>
        hey this is admin component
    h1>
  );
};

After running:

npm run dev

the page becomes available at:

http://localhost:3000/admin

Turborepo also provides generators that can automate these steps, making it faster to scaffold new components.

Adding a React App (Vite) to the Monorepo

Go to the apps folder:

cd apps

Create a fresh Vite app:

npm create vite@latest

Update the package.json of the new React app to include the shared UI package:

"@repo/ui": "*"

Go back to the root folder and install dependencies:

cd ..
npm install

Start the dev servers:

npm run dev

Try importing a component from the ui package and render it in your React app.

Now ,add a turbo.json file inside the React app folder to override the build outputs:

{
  "extends": ["//"],
  "tasks": {
    "build": {
      "outputs": ["dist/**"]
    }
  }
}

This tells Turborepo that the build output for this app lives in the dist folder, allowing proper caching and build optimizations.

Ref : https://turborepo.com/docs/reference/package-configurations

Caching in Turborepo

One of the easiest ways to see Turborepo’s caching in action is to run:

npm run build

twice in a row. The second run will be significantly faster because Turborepo pulls results from cache. You can inspect cached files inside:

node_modules/.cache/turbo

Ref : https://turborepo.com/docs/crafting-your-repository/caching

Adding a NodeJS App to the Monorepo

Adding a Node.js backend app follows the same pattern as frontend apps. The main difference is that tools like tsc don’t perform optimally in large monorepos, so tools like esbuild or tsup are preferred.

Create a new backend app inside apps:

mkdir apps/backend
cd apps/backend

Initialize an empty TypeScript project:

npm init -y
npx tsc --init

Update tsconfig.json to extend the shared base config:

{
  "extends": "@repo/typescript-config/base.json",
  "compilerOptions": {
    "lib": ["ES2015"],
    "module": "CommonJS",
    "outDir": "./dist"
  },
  "exclude": ["node_modules"],
  "include": ["."]
}

Install dependencies:

npm i express @types/express

Create src/index.ts:

import express from "express";

const app = express();

app.get("/", (req, res) => {
  res.json({
    message: "hello world"
  });
});

Add a turbo.json inside the backend app:

{
  "extends": ["//"],
  "tasks": {
    "build": {
      "outputs": ["dist/**"]
    }
  }
}

Install esbuild:

npm install esbuild

Add a build script to package.json:

"build": "esbuild src/index.ts --platform=node --bundle --outdir=dist"

You can also create a shared common package that exports constants, types, or utility functions and use it across frontend and backend apps. This is one of the biggest advantages of a monorepo setup.

Conclusion

Monorepos are not mandatory, and they’re not always the right choice. But when your project has multiple applications that share code and evolve together, a monorepo can drastically improve developer experience, consistency, and build performance.

I’d love to hear different perspectives and connect with people who are exploring or already working with monorepos and developer tooling.

Thanks for taking the time to read!

Signing off,
Masir Jafri

Feature Engineering - The Art of Shaping Data

Masir Jafri — Sun, 09 Nov 2025 20:05:49 GMT

So yes, the topic is about Feature Engineering which is an essential part of Machine Learning Pipelines .

Since the topic is very broad and subjective so in this blog, we’ll see some typical definitions, examples, and a high-level intuition of feature engineering without going too deep. So lets start from very beginning.

What Are Features and Feature Engineering?

After hearing the word feature, the first question that comes to mind is What exactly is a feature, and what is feature engineering?

In very simple terms, you can think of features as the columns of a dataset (except the label or target column).While Feature Engineering is the process of transforming raw data into meaningful features that improve the performance of a machine learning model. Basically, it’s the art of making data more understandable and useful for ML algorithms.

There’s a popular saying that perfectly justifies its importance:

Feature engineering isn’t just a process, it’s an art form. Because in the end, a bad algorithm with great features will always outperform a great algorithm with bad ones.

The Four Categories of Feature Engineering

Feature engineering is broadly divided into four categories:

Feature Transformation
Feature Construction
Feature Selection
Feature Extraction

Let’s understand them one by one.

Feature Transformation :

Feature Transformation means modifying existing features to make them more suitable for model training without changing their actual meaning. It focuses on cleaning and standardizing data, and is generally divided into these subparts:

a) Missing Value Imputation

Since machine learning models can’t handle missing values directly, we need to either fill (impute) or remove them. Common imputation strategies include

Mean, Median, or Mode replacement
Model-based or predictive imputation

In the image, you can see missing values filled with the column’s mean which is a basic example of Missing Value Imputation.

b) Handling Categorical Values

Categorical features like Gender, Country, or Rating cannot be directly used in most ML models because they are non-numeric. So, we convert them into numerical form using encoding techniques.

As shown in the image , “Horse”, “Cat”, “Dog”, “Sheep”, “Lion” are turned into [1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1] using one hot encoding method. That’s a perfect example of handling categorical values.

c) Outlier Detection

Outliers are extreme values that differ greatly from the rest of the data. If not handled properly, they can bias the model and mislead its learning process. Lets understand this with an example :

Imagine a class of 10 students , 9 of them got placement averaging salary of ₹5 LPA . Now imagine of Sharma Ji’s son who cracked placement of ₹ 1.5 CR/year. Now the average placement of the class has jumped to ₹ 19.5 LPA which is misleading . The hike in average has been created because of the Sharma Ji's Son alone . So here he is the outlier in this case and and that’s why outlier detection and processing are essential.

d) Feature Scaling

Feature Scaling means bringing all numerical features to a similar range so that no single feature dominates others. In the example, we have:

Age: ranges between 30–50
Income: ranges between 40,000–90,000

The large range of income can heavily influence algorithms that depends on distance calculations (like KNN or K-Means).To fix this, we use techniques like:

Standardization : It is a scaling technique where the values are centered around the mean with a unit standard deviation.

Normalization : It is a scaling technique in which values are shifted and rescaled so that they end up ranging between 0 and 1. It is also known as Min-Max scaling.

Feature Construction :

Feature Construction means creating new features from existing ones to better capture underlying patterns. This step often depends on domain knowledge.

For example If you have weight and height columns, you can create a new feature i.e BMI column which is now representing both the factors. Here is one more example , Below is the famous titanic dataset :

It includes columns such as SibSp - Sibling Spouse and ParCh - Parent Child. In this case we can combine them both in a single column called family_members .

Feature Selection :

Feature Selection is the process of choosing only the most relevant features that contribute to model performance and removing the redundant or irrelevant ones.

We need to remember that :

More features ≠ better accuracy

Think of it this way that a diamond is formed from coal, but not every piece of coal becomes a diamond.

Lets have some intuition about feature selection with the help of MNIST dataset :

Here in the dataset each image is 28×28 pixels (784 features). However, many of these pixels are just zeros (black background). So by selecting only the relevant features (i.e features with some values), the model becomes faster and more efficient.

Feature Extraction :

Feature Extraction transforms the original features into a new set of features that still represent the underlying information but in a more compressed way. One of the great example is PCA (Principal Component Analysis) for application of Feature Extraction.

Here we took 4 numerical features (i.e sepal length, sepal width, petal length and petal width) from the Iris dataset. After applying PCA, those 4 features can be reduced to 2 principal components preserving most of the variance but reducing dimensionality.

Why use Feature Extraction?

High dimensionality (too many features cause overfitting and slow training)
Correlation/Redundancy (some features overlap in meaning)
Visualization (reduces data to 2D/3D for easy pattern recognition)

Now, you might be confused between Feature Selection and Feature Extraction. So lets clear that first before moving forward to conclusion.

The only difference between them is :

Feature Selection: I’ll pick the best.
Feature Extraction: I’ll create the best.

In Feature Selection, we keep only the most useful features while in Feature Extraction, we combine or transform existing features into new ones. The aim of Feature Selection is to keeps relevant features while the aim of Feature Extraction is to reduce dimensions and compresses information.

Conclusion :

Feature Engineering is more than just a technical process. It is the very subjective way to bridge the data and algorithms in Machine Learning. Good features can turn even a simple model into a powerful one.

I would love to hear different perspectives and connect with people exploring this field.

Thank you for reading!

Signing off,
Masir Jafri

ML vs DL : My Basic Understanding as a Beginner

Masir Jafri — Thu, 06 Nov 2025 14:36:40 GMT

So let’s get started with some background. Artificial Intelligence (AI) is not something newly invented. Research in AI started way back in the 1950s–1960s. The major problem at that time was that:

Data wasn't available in large amounts
Collecting data was slow and expensive
Hardware was weak and not powerful enough

Then came the 2000s–2010s.
The internet became common, smartphones came into existence, data started growing very rapidly, and hardware evolved as well (GPUs, TPUs, optimized chips, cloud storage, etc.). Today even mobile phones have 8–12 GB RAM and terabytes of storage which was unthinkable back then.

What is Machine Learning?

So now lets understand about ML first with the typical definition

It is the methodology of data analysis used to find the hidden insights and patterns from the data without explicitly programming it.

I know it is the typical definition but i didn’t any found any better way to explain this. The important phrase to notice here is “without explicitly programming.”

Let’s understand this with an example. You want to classify emails as spam or ham (genuine).
If you use traditional programming, you’d write a long list of if-else rules like:

If mail has too many links → spam
If sender is unknown → spam
If mail contains “check it out” 4+ times → spam

But this manual approach has two big problems:

You cannot imagine all possible spam patterns in advance
Spammers can easily bypass your rules (e.g., replace “check it out” with “have a look”)

Machine Learning removes this headache. Instead of writing rules, you feed the model examples of spam and not-spam emails, and the model learns the pattern on its own. It will cover mostly every case and would work even when the format of spam mail is changed.

What is Deep Learning?

We can consider Deep Learning simply as a subset of Machine Learning, but it becomes useful when ML reaches its limits. And here comes the typical definition.

Definition :

“Deep Learning is a type of Machine Learning that uses multiple layers of neural networks to automatically learn patterns from large amounts of data, without us manually defining the features.”

The limitation of Machine Learning is that it manually decide the features while in DL, the model automatically learns the features.

Example:
If we want to classify images of cats and dogs using ML, we need to define features like:

Ears, Size, Color, etc. for cats
Tail Type, Fur Pattern, etc. for dogs

But the problem is that , there are too many variations ie different breeds, sizes, colors, lighting, angles, etc. So manually defining features becomes impossible in this case.

Deep Learning solves this by using neural networks with many layers, where each layer learns patterns on its own like edges, shapes, textures, etc.
The more layers → the deeper the network → the better it understands complex data like images, audio, speech, video, etc.

Now the question may arise that if the Deep Learning models are that much efficient , why we need to still use the Machine Learning model? So here is the answer :

“You don’t use a sword where the work of a needle is required.”

Deep Learning Require very very Huge Datasets with Very High computational power and long training time. So yes, DL is powerful, but not always necessary. For small datasets and simple problems, ML is still the better, faster, cheaper choice.

Also, unlike ML, DL models usually get better as data increases while ML models often hit a limit where more data doesn’t improve performance .

Conclusion:

Both ML and DL have their own benefits, challenges, and ideal use cases. ML is simple, faster, and works well with small data while DL is powerful, automatic, and great for high dimensional data like images, audio, video, text but it requires huge data and powerful hardware.

I would love to hear different perspectives and connect with people exploring this field.

Signing off,
Masir Jafri

masirjafri

Understanding Monorepos with Turborepo (Hands-On Guide)

Why Not Just Use Simple Folders?

Why Monorepos Are Useful

Common Monorepo Tools in the Node.js Ecosystem

A Bit of Turborepo History

Build System vs Build System Orchestrator vs Monorepo Framework

What Makes Turborepo Powerful

Initializing a Turborepo

Understanding the Folder Structure

Running the Project

Exploring root package.json

Exploring packages/ui

`package.json`

`src/button.tsx`

Exploring apps/web

Dependencies

`package.json`

`page.tsx`

Adding a New Page Using Shared UI

Adding a React App (Vite) to the Monorepo

Caching in Turborepo

Adding a NodeJS App to the Monorepo

Conclusion

Feature Engineering - The Art of Shaping Data

What Are Features and Feature Engineering?

The Four Categories of Feature Engineering

Feature Transformation :

a) Missing Value Imputation

b) Handling Categorical Values

c) Outlier Detection

d) Feature Scaling

Feature Construction :

Feature Selection :

Feature Extraction :

Conclusion :

ML vs DL : My Basic Understanding as a Beginner

What is Machine Learning?

What is Deep Learning?

Conclusion: