Ci Cd Pipeline Best Practices | Pillai Infotech LLP

Q: How long should a CI/CD pipeline take?

CI should complete in under 10 minutes for PR checks. The full CD pipeline (including staging verification and canary rollout) can take 30-60 minutes, but that time is automated. If your CI regularly exceeds 15 minutes, it's too slow and developers will work around it.

Q: Should we use monorepo or multi-repo for CI/CD?

Both work. Monorepos need smart change detection. Multi-repos need coordinated releases. For teams under 50 engineers, monorepo is usually simpler. Above that, the tooling investment becomes significant but worthwhile.

Q: How do we handle database migrations in CI/CD?

Always use the expand-contract pattern. Migrations must be backward-compatible because during rolling deployments, old and new code runs simultaneously. Run migrations as a separate step before deploying new code.

Q: Is Jenkins still worth using in 2026?

Only if you have specific requirements that cloud-hosted CI can't meet. For everyone else, GitHub Actions or GitLab CI provide a better experience with less maintenance.

Q: How do we start with CI/CD if we have nothing today?

Start small: set up GitHub Actions with linting and unit tests on PR, add automated deployment to staging, add integration tests as a staging gate, then graduate to production deploys with canary releases. This takes most teams 2-3 months.

Q: What's the difference between CI/CD and GitOps?

CI/CD pushes changes through a pipeline to production. GitOps pulls the desired state from Git using an operator like ArgoCD. Most teams use both: CI pushes artifacts and updates Git manifests, GitOps pulls and deploys.

In this article

The Pipeline Reality Check
CI: Getting the Basics Right
Testing Strategy That Scales
CD: Deployment Patterns
Security in the Pipeline
Rollback Strategies
CI/CD Platform Comparison
Pipeline Anti-Patterns
FAQ

A client came to us last year with a "CI/CD pipeline" that was really just a Jenkins server running a bash script that did git pull && npm run build && scp -r dist/ prod-server:/var/www/. No tests. No staging. No rollback plan. They averaged one production incident per week.

Three months later, the same team was deploying 15 times a day with automated tests, canary releases, and one-click rollbacks. Production incidents dropped to two per month. The pipeline wasn't magic — it was discipline, layered in the right order.

At Pillai Infotech, we've built and fixed CI/CD pipelines for teams ranging from 3-person startups to 200-engineer enterprise shops. The mistakes are almost always the same. So are the patterns that fix them.

The Pipeline Reality Check

Before we talk about best practices, let's talk about what actually matters. Your pipeline has one job: get code from a developer's machine to production safely and quickly. Everything in the pipeline either contributes to safety, speed, or both. If it does neither, it's waste.

The Two Numbers That Matter

Every pipeline optimization ultimately serves two metrics:

Lead time: How long from code commit to running in production? Elite teams do this in under 1 hour. Most teams take days or weeks.
Change failure rate: What percentage of deployments cause a production issue? Elite teams are under 5%. Struggling teams hit 30-50%.

These two numbers are inversely correlated — but only up to a point. The teams that deploy most frequently also have the lowest failure rates. That's not a coincidence. Small, frequent changes are inherently lower risk than big-bang releases.

What we see at Pillai Infotech: Teams that deploy less than once a week almost always have a change failure rate above 20%. Teams deploying multiple times per day are consistently under 8%. The pipeline makes the difference — not more careful developers, but better safety nets.

CI: Getting the Basics Right

Continuous Integration sounds simple: developers merge code frequently, and every merge triggers automated checks. In practice, most teams get this wrong in subtle ways.

The 10-Minute Rule

Your CI pipeline should complete in under 10 minutes. Not 10 minutes as a goal — 10 minutes as a hard constraint. Here's why: if CI takes 30 minutes, developers stop waiting for it. They stack PRs, context-switch, and by the time CI fails, they've forgotten what they were doing. A 10-minute pipeline stays in the developer's active context.

How to hit 10 minutes:

Parallelize tests: Split your test suite across multiple runners. Most CI platforms support this natively.
Cache aggressively: Dependencies, Docker layers, compiled assets — cache anything that doesn't change between builds.
Run only what's affected: If a PR only touches the billing module, don't run the entire 40-minute test suite. Use tools like nx affected, turborepo, or custom change detection.
Pre-build base images: Don't install system dependencies on every CI run. Build a base Docker image weekly and use it as your CI runtime.

Branch Strategy

The branching model directly impacts your CI effectiveness:

Strategy	Best For	CI Impact	Risk
Trunk-based	Experienced teams, microservices	Fast — always testing against main	Requires strong test coverage + feature flags
GitHub Flow	Most teams, SaaS products	Good balance of speed + safety	Long-lived branches cause merge pain
GitFlow	Versioned releases, mobile apps	Slow — multiple integration points	Branch management overhead
Release branches	Regulated industries, compliance	Controlled — explicit promotion gates	Cherry-pick drift between branches

Our recommendation for most teams: GitHub Flow with short-lived branches (merged within 1-2 days). It's simple, well-supported, and keeps integration pain low. Trunk-based development is better but requires discipline most teams haven't built yet.

Testing Strategy That Scales

The testing pyramid is well-known. The problem is most teams build it upside down — heavy on slow integration tests, light on fast unit tests. Here's what we recommend instead:

The Practical Testing Pyramid

Unit tests (70% of tests, <2 min): Test individual functions and classes. Mock external dependencies. These should be fast enough to run on every save in the IDE.
Integration tests (20%, <5 min): Test service boundaries — API endpoints, database queries, message consumers. Use testcontainers for real databases, not mocks.
End-to-end tests (10%, <10 min): Critical user flows only. Login, checkout, core business workflow. Keep this set ruthlessly small — every E2E test you add slows your pipeline.

What to Test in CI vs. Separately

In CI (every PR): Unit tests, linting, type checking, security scanning (SAST), integration tests for changed modules.

Post-merge (async): Full integration suite, E2E tests, performance regression tests, accessibility checks.

Scheduled (nightly/weekly): Dependency vulnerability scans, license compliance, chaos tests, load tests.

Lesson from a client project: A healthcare SaaS client had a 45-minute CI pipeline because every PR ran 2,000 E2E tests. We restructured: unit tests and affected-module integration tests in CI (8 minutes), full E2E suite post-merge as a deployment gate. Deployment frequency went from 2/week to 4/day. No increase in production bugs.

Test Quality Over Test Quantity

We've seen codebases with 95% code coverage that still had production bugs weekly. Coverage measures whether code was executed, not whether it was tested meaningfully. Focus on:

Mutation testing: Tools like Stryker or PIT introduce bugs into your code and check if tests catch them. A codebase with 80% coverage and 70% mutation score is better tested than one with 95% coverage and 40% mutation score.
Boundary testing: Test edge cases — null inputs, empty arrays, maximum values, concurrent access, timeout scenarios.
Contract testing: For microservices, use Pact or similar tools to verify that service interfaces stay compatible.

CD: Deployment Patterns That Protect Production

Continuous Delivery means every commit could go to production. Continuous Deployment means every commit does go to production. Most teams should start with Delivery and graduate to Deployment once confidence is high.

Progressive Delivery

The safest way to deploy is gradually. Progressive delivery routes increasing percentages of traffic to new code while monitoring for errors:

Canary deployment: Route 1-5% of traffic to the new version. Monitor error rates, latency, and business metrics for 15-30 minutes. If healthy, increase to 25%, then 50%, then 100%.
Blue-green deployment: Run two identical environments. Deploy to the idle one, verify, then switch traffic. Instant rollback by switching back.
Feature flags: Deploy code to all servers but toggle features on for specific users or percentages. Decouples deployment from release.
Shadow deployment: Route a copy of production traffic to the new version without serving responses. Compare outputs and performance against the live version.

When to Use Each Pattern

Pattern	Setup Complexity	Rollback Speed	Best For
Canary	Medium	Seconds (route back)	API services, high-traffic apps
Blue-Green	Medium-High	Seconds (switch environments)	Monoliths, database-coupled apps
Feature Flags	Low-Medium	Instant (toggle flag)	Product features, A/B testing
Shadow	High	N/A (not serving)	ML models, critical path changes

Database Migrations in CD

Database changes are the most dangerous part of any deployment. We follow this rule at Pillai Infotech: every database migration must be backward-compatible. The new code must work with both the old and new schema, because during a rolling deployment, both versions run simultaneously.

The expand-contract pattern:

Expand: Add the new column/table. Don't remove or rename anything yet. Deploy code that writes to both old and new locations.
Migrate: Backfill data from old to new location. Verify consistency.
Contract: Remove old column/table after all code exclusively uses the new location. This is a separate deployment.

Security in the Pipeline

Security scanning isn't optional — it's a pipeline stage. The goal is to catch vulnerabilities before they reach production, without slowing down deployments.

The DevSecOps Pipeline

Pre-commit: Secret scanning (gitleaks, trufflehog) — catch API keys and passwords before they enter version control.
CI stage: SAST (static analysis) with Semgrep, SonarQube, or CodeQL. Runs in 2-3 minutes for most codebases.
Build stage: Container image scanning with Trivy or Snyk Container. Dependency vulnerability checking with npm audit, safety (Python), or OWASP dependency-check.
Pre-deploy: DAST (dynamic analysis) against staging. Tools like OWASP ZAP or Burp Suite scan running applications for vulnerabilities.
Post-deploy: Runtime security monitoring with Falco, cloud-native services (GuardDuty, Security Command Center), or Wiz for cloud security posture.

Handling Security Findings

Not every vulnerability is a pipeline blocker. We use a severity-based approach:

Critical/High: Block the pipeline. No deployment until resolved.
Medium: Flag in PR review. Must be resolved within the sprint.
Low: Track in backlog. Review monthly.

The key is zero tolerance for new critical issues while being pragmatic about existing technical debt. Trying to fix everything at once paralyzes the pipeline.

For more on building secure software at speed, see our responsible AI development guide and Cloud & DevOps services.

Rollback Strategies That Actually Work

Every deployment plan needs a rollback plan. The question isn't if you'll need to roll back — it's when, and whether you're prepared.

Automated Rollback Triggers

Define automated rollback conditions before deploying:

Error rate threshold: If 5xx errors exceed 2% of requests within 5 minutes of deploy, roll back automatically.
Latency threshold: If p95 latency increases by more than 50%, roll back.
Business metric threshold: If conversion rate drops by more than 10%, roll back. This catches issues that don't manifest as errors.
Health check failures: If more than 20% of instances fail health checks, halt the rollout and roll back.

Types of Rollback

Deployment rollback: Redeploy the previous version. Works for stateless services. Takes 2-5 minutes.
Traffic rollback: Shift traffic back to the old version (canary/blue-green). Instant.
Feature flag rollback: Disable the flag. Instant, no redeployment needed.
Database rollback: The hardest kind. If your migration was backward-compatible (expand-contract), the old code still works. If it wasn't, you're in trouble.

War story: An e-commerce client deployed a database migration that renamed a column. The migration was not backward-compatible. The old code crashed. The rollback ran the "down" migration and renamed the column back — but 3 minutes of writes during the failed deployment were lost because the new code had been writing to the renamed column. Cost: $47K in lost orders. The fix was simple: expand-contract migration. The lesson was expensive.

CI/CD Platform Comparison (2026)

The platform you choose matters less than how you use it. That said, here's our honest assessment from using all of these in production:

Platform	Best For	Pricing	Our Take
GitHub Actions	GitHub-native teams	2,000 free min/mo, $0.008/min	Best ecosystem. YAML can get complex. Our default choice.
GitLab CI	All-in-one platform teams	400 free min/mo, $10/user	Best integrated experience. Less marketplace flexibility.
CircleCI	Docker-heavy workflows	6,000 free credits/mo	Fast, great caching. Orbs simplify config. Smaller ecosystem.
Jenkins	Max flexibility, self-hosted	Free (self-hosted costs)	Infinitely customizable. Maintenance burden is real. Use only if you must.
ArgoCD	Kubernetes-native GitOps	Free (open-source)	Best for K8s deployments. Not a CI tool — pair with GH Actions or GitLab.
Dagger	Pipeline-as-code purists	Free (open-source)	Write pipelines in Go/Python/TypeScript. Portable across CI platforms. Rising star.

For Kubernetes deployments, we recommend a split approach: GitHub Actions for CI (build, test, push images) + ArgoCD for CD (GitOps-based deployment). This gives you the best of both worlds — a great CI experience and declarative, auditable deployments. Read more in our Terraform guide and DevOps best practices article.

Pipeline Anti-Patterns We See Constantly

1. The "Works on My Machine" Pipeline

CI runs in a different environment than local development. Developers write code, push, CI fails, they fix the CI issue (not the code issue), push again. This is a symptom of environment drift.

Fix: Use the same Docker image for local development and CI. Dev containers, Nix, or a shared Dockerfile that both local dev and CI use as their base environment.

2. The Flaky Test Graveyard

Tests that pass 90% of the time. Teams learn to ignore them. They re-run the pipeline until it goes green. This erodes trust in the entire test suite.

Fix: Quarantine flaky tests immediately. Track flakiness rates. A test that fails more than 1% of runs without a code change goes into quarantine. Fix or delete quarantined tests within one sprint.

3. The Snowflake Pipeline

Every team builds their own pipeline from scratch. No shared patterns, no reusable components. 15 teams, 15 different ways to deploy.

Fix: Build a shared pipeline library (GitHub Actions reusable workflows, GitLab CI templates, Jenkins shared libraries). Teams customize parameters, not pipeline structure. This is what platform engineering is all about.

4. The Manual Gate That Never Opens

A manual approval step before production deploy. Sounds safe. In practice, the approver rubber-stamps everything because they can't meaningfully review deployments at the pipeline level.

Fix: Replace manual gates with automated quality gates — test pass rates, security scan results, performance benchmarks. If you need human oversight, put it in the PR review, not the deploy pipeline.

5. The "Deploy on Friday" Culture

If deploying on Friday feels risky, your pipeline is broken. A healthy pipeline makes any deploy safe because every deploy is small, tested, and rollback-ready.

Fix: Deploy constantly. The safest time to deploy is always "now" when your pipeline has proper canary releases and automated rollbacks.

Putting It All Together: A Reference Pipeline

Here's the pipeline structure we implement most often for web applications at Pillai Infotech:

# Simplified GitHub Actions workflow
# .github/workflows/deploy.yml

on:
  push:
    branches: [main]
  pull_request:

jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: npm }

      # Parallel quality checks
      - run: npm ci
      - run: npm run lint & npm run typecheck & wait
      - run: npm run test:unit -- --coverage
      - run: npm run test:integration

      # Security scanning
      - uses: aquasecurity/trivy-action@master
        with: { scan-type: fs, severity: CRITICAL,HIGH }

  build:
    needs: ci
    runs-on: ubuntu-latest
    steps:
      - run: docker build -t app:${{ github.sha }} .
      - run: docker push registry/app:${{ github.sha }}

  deploy-staging:
    needs: build
    if: github.ref == 'refs/heads/main'
    environment: staging
    steps:
      - run: kubectl set image deploy/app app=registry/app:${{ github.sha }}
      - run: kubectl rollout status deploy/app --timeout=300s
      - run: npm run test:e2e -- --base-url=$STAGING_URL

  deploy-production:
    needs: deploy-staging
    environment: production
    steps:
      # Canary: 5% traffic for 10 minutes
      - run: kubectl apply -f canary-5pct.yml
      - run: sleep 600 && ./scripts/check-metrics.sh

      # Progressive rollout
      - run: kubectl apply -f canary-25pct.yml
      - run: sleep 300 && ./scripts/check-metrics.sh

      # Full rollout
      - run: kubectl set image deploy/app app=registry/app:${{ github.sha }}
      - run: kubectl rollout status deploy/app

This isn't a template to copy blindly — it's a structure to adapt. The important thing is the flow: fast CI checks → build artifact → staged deployment with verification at each step.

For teams using Docker in production, the container build step becomes critical — read our Docker best practices for optimizing build times and image security.

Frequently Asked Questions

How long should a CI/CD pipeline take?

CI should complete in under 10 minutes for PR checks. The full CD pipeline (including staging verification and canary rollout) can take 30-60 minutes, but that time is automated — no human is waiting. If your CI regularly exceeds 15 minutes, it's too slow and developers will work around it.

Should we use monorepo or multi-repo for CI/CD?

Both work. Monorepos need smart change detection (only build/test affected packages). Multi-repos need coordinated releases for shared dependencies. For teams under 50 engineers, monorepo is usually simpler. Above that, the tooling investment for monorepo CI (Nx, Bazel, Turborepo) becomes significant but worthwhile.

How do we handle database migrations in CI/CD?

Always use the expand-contract pattern. Migrations must be backward-compatible because during rolling deployments, old and new code runs simultaneously. Run migrations as a separate step before deploying new code. Never combine a destructive migration with a code deploy.

Is Jenkins still worth using in 2026?

Only if you have specific requirements that cloud-hosted CI can't meet (air-gapped networks, unusual hardware, extreme customization). For everyone else, GitHub Actions or GitLab CI provide a better experience with less maintenance. If you're running Jenkins, consider migrating — the maintenance burden is real.

How do we start with CI/CD if we have nothing today?

Start small: (1) Set up GitHub Actions with linting and unit tests on PR. (2) Add automated deployment to staging on merge to main. (3) Add integration tests as a staging gate. (4) Graduate to production deploys with manual approval, then automated canary. This progression takes most teams 2-3 months.

What's the difference between CI/CD and GitOps?

CI/CD pushes changes through a pipeline to production. GitOps pulls the desired state from Git — an operator (like ArgoCD) watches a Git repo and reconciles the actual state to match. GitOps is a deployment pattern that replaces the "CD push" with a "CD pull." Most teams use both: CI pushes artifacts + updates Git manifests, GitOps pulls and deploys.

CI/CD Pipeline Best Practices: From Code to Production