Write the problem out - i.e., understand the problem

What is the problem?

We have several DNA fragments, each with a start, end, and quality score.
Some fragments overlap, and we can’t select two that overlap.
The goal is to choose a subset of non-overlapping fragments that gives the maximum total quality.

Formally:
Given intervals $(s_{i}, f_{i}, w_{i})$ , find $max \sum w_{i}$ such that no two chosen intervals overlap.

What are the sub-problems?

First, we need to find a way to identify sub-problems.

Every fragment has a start and an end position, and some of them overlap.

If we want to build an optimal solution step by step, we need a clear order to process them. Otherwise, we wouldn’t know which fragments are compatible or which combinations we’ve already evaluated.

Question

How can we guarantee that, when we evaluate a fragment, all the possible compatible choices have already been considered?

How can we quickly find the last fragment that doesn’t overlap with the current one?

Answer

The natural answer is to sort all fragments by their end position.

When the fragments are sorted, the problem of finding the optimal subset up to fragment i depends only on:

the best solution without using fragment i (dp[i-1]), and
the best solution that includes fragment i, which equals the fragment’s weight wᵢ plus the best total quality before the last compatible fragment (dp[p(i)], where p(i) ends before sᵢ).

So the sub-problem is:

What is the maximum achievable total quality among the first i fragments?

Each sub-problem reuses the results of smaller ones → that’s the essence of dynamic programming.

What would the solution roughly look like?

We build an array dp where dp[i] is the best total quality for the first i fragments.
For each fragment i:

dp[i] = max(
    w[i] + dp[p(i)],   # include fragment i
    dp[i-1]            # exclude fragment i
)

The final answer is dp[n].

Algorithm outline:

Sort fragments by end.
For each fragment, find p(i) = the last fragment that ends before start[i].
Apply the recurrence above.

Question

Which is the time complexity of the algorithm?

Answer

Time complexity: O(n log n) (sorting + binary search for p(i)).

Recurrence

Let us understand more in depth how the recurrence works.

Define the Base Case

The simplest possible situation is when we have no fragments to consider.
In that case, the maximum total quality is 0.

So:

$d p [0] = 0$

That’s our base case — the starting point for building all other dp[i].

What Decision Do I Make at Step n?

At fragment n (the last one in the sorted list), we face a binary choice:

Include the fragment → add its weight wₙ to the best total before the last non-overlapping fragment (dp[p(n)]);
Exclude the fragment → just keep the previous best total (dp[n−1]).

Formally:

$d p [n] = max (w_{n} + d p [p (n)], d p [n - 1])$

This is the decision rule — the core of the DP recurrence.

What Decision Do I Make at Step n−1 and n+1?

The same decision logic applies at every step:

At step n−1, you decide whether to include fragment n−1 or skip it.
- If you include it: add $w_{n - 1}$ + dp[p(n−1)].
- If you skip it: use dp[n−2].
At step n+1 (if we extended the sequence), you’d again check whether the new fragment n+1 overlaps with previous ones, then:
$d p [n + 1] = max (w_{n + 1} + d p [p (n + 1)], d p [n])$

— same pattern, just shifted one position forward.

So at every i, the decision is always:

Include fragment i (and jump back to the last compatible one) or exclude it.

Dimension of the memoisation array

Remember that in Dynamic Programming, Programming means that we compute a table with all the solutions of the subproblems of our problem until we reach the solution of the instance of our problem.

Since we need to store all the solutions (i.e., memoisation) we need to understand which is the dimension of our table (1 row and n columns ⇒ 1 D, m rows and n columns ⇒ 2 D, m rows and n columns and j values for each cell ⇒ 3D and so on..)

How many state variables do we have in our subproblems?

Our subproblem only needs to remember how far we’ve gone in the sorted list of fragments.

We have 1 state variable in our subproblem:

the index i of the fragment we’re considering (after sorting by end time).

Each subproblem asks:

“What is the maximum achievable total quality using the first i fragments?”

So the dynamic programming table dp[i] has one dimension — it depends only on how many fragments we’ve considered so far.

How big is the set of the variables?

The variable i ranges over all fragments:

$i \in 0, 1, 2, \dots, n$

So there are n + 1 possible subproblems (including the base case i = 0).

Each subproblem is solved in $O (1)$ once we know p(i) and computing all p(i) values requires $O (n lo g n)$ (via binary search after sorting).

Now Code it!

Once we got the solution to these questions we can code/sketch in pseudocode our algorithm!

Marco Zecchini's page

Explorer

Weighted DNA Fragment Selection - GUIDED

Write the problem out - i.e., understand the problem

What is the problem?

What are the sub-problems?

What would the solution roughly look like?

Recurrence

Define the Base Case

What Decision Do I Make at Step n?

What Decision Do I Make at Step n−1 and n+1?

Dimension of the memoisation array

How many state variables do we have in our subproblems?

How big is the set of the variables?

Now Code it!

Graph View

Table of Contents

Backlinks