SmartUpdate

Introduction

SmartUpdate focuses on the optimization of rule updating for the well-known packet classification problem on the Internet.

It's the project I worked on as a software engineer intern at Juniper Networks in Beijing, China, where I was supervised by Bill Fang.

Details

1. Problems and Motivations

Rules updating is a subproblem in packet classification. Given an original rule sets and pre-computed data structures, rules updating required inserting or deleting certain rules, which means the original data structures should be updated online.

To support cloud computing, we are required to complete updating operations in restricted time with high after-updating classification performance at the same time. However, typical algorithms including Tuple Space Search and HyperSplit are not able to achieve high-speed packet classification and rapid rules updating simultaneously.

2. Aims

We want to develop an algorithm to:

  1. achieve better updating speed;
  2. achieve better after-updating classification performance;
  3. handle the trade-off between the above performance smartly.

3. Methods and Solutions

1. Assessment algorithms

We propose assessment algorithms to predict the updating complexity for various rule sets based on segment distribution, overlapping density, tuple space size, and rule set size.

For the decision tree based algorithms, we have

def Preprocess(ruleset, segment_list):
    overlap_density = InitialList(size = DIM_MAX)
    for i in range(0, DIM_MAX): 
        segment_list = GenerateSegment(ruleset, i)
        segment_distribution[i] = segment_list.size
        for segment in segment_list:
            for rule in ruleset:
                if IsOverlap(rule, segment):
                    overlap_density[i]++
        overlap_densit[i] /= ruleset.size
    return overlap_density, segment_distribution

def Estimator(new_ruleset, original_ruleset):
    segment_list = GetInfo(original_ruleset)
    overlap_density, segment_distribution = Preprocess(new_ruleset, segment_list)
    time_base = SystemTest()
    operation_num = 0
    for i in range(0, DIM_MAX): 
        operation_num += overlap_density[i] * segment_distribution[i]
    estimator = operation_num * time_base
    return estimator

For tuple space based algorithms, we have

def Preprocess(ruleset):
    tuple_list = InitialList()
    for rule in ruleset:
        tuple_found = LinearSearch(tuple_list, rule)
        key = CreateKey(rule)
        if tuple_found is not None:
            tuple_new = CreateTuple(key)
            AddListItem(tuple_list, tuple_new)
    return tuple_list.size, ruleset.size

def Estimator(new_ruleset, original_tuple_num):
    new_tuple_num, new_rule_num = Preprocess(ruleset)
    time_base = SystemTest()
    operation_num = 0
    for i in range(0, tuple_num - 1): 
        operation_num += original_tuple_num + i
    operation_num += (new_tuple_num + original_tuple_num) * (new_rule_num + new_tuple_num)
    estimator = operation_num * time_base
    return estimator

2. SmartUpdate Algorithms

Leveraging grouping and dynamic decision-making mechanism, we propose the SmartUpdate algorithm.

def SmartUpdateIncrementalUpdate(ruleset, new_ruleset, toleration,
preference):
    if Estimator([ruleset; new_ruleset]) < toleration:
        return HyperSplitFullVolumeUpdate([ruleset; new_ruleset])
    else:
        data_structures = InitialList()
        switch (preference):
        case CLASSIFICATION_SPEED :
            return SmartUpdateFullVolumeUpdate([ruleset; new_ruleset], toleration, CLASSIFICATION_SPEED)
        case UPDATING_SPEED :
            original_data_structure = GetInfo(ruleset)
            new_data_structure = TupleSpaceSearchFullVolumeUpdate(ruleset)
            AddListItem(data_structures, original_data_structure) 
            AddListItem(data_structures, new_data_structure)
        default:
            data_structures = HybridUpdate(ruleset, new_ruleset,
toleration)
    return data_structures

def HybridUpdate(new_ruleset, original_tuple_num):
    original_data_structure GetInfo(ruleset)
    if Estimator(original_data_structure, new_ruleset) <
toleration: 
    return IncrementalUpdate(original_data_structure, new_ruleset)
    else: 
        data_structures = InitialList()
        if Estimator(new_ruleset) < toleration:
            new_data_structure = HyperSplitFullVolumeUpdate(ruleset)
        else:
            new_data_structure = SmartUpdateFullVolumeUpdate(ruleset, toleration, CLASSIFICATION_SPEED)
        AddListItem(data_structures, original_data_structure)
        AddListItem(data_structures, new_data_structure)
    return data_structures

4. Performance

For rules updating speed, SmartUpdate Strategy 2 beats HyperSplit about one hundred times.

Rules Updating Speed

For after-updating classification speed, SmartUpdate beats HyperSplit about 5 times on complex rules set (like Fire Wall 5K, 7K, and 10K).

Related Technologies

  1. DPDK: We use DPDK to generate and forward Internet packets.
  2. Rules Generator, Traces Generator: We refined these tools to generate rules and traces and do not leak the confidential information of industry rule sets. 
  3. Algorithms: We use hash tables in parts related to tuple space, and build balanced decision trees using recursion in split methods.

Programming Languages

  1. C: 70%. The main projects are using C (for better performance).
  2. Python: 25%. I write python scripts for preprocessing of rules, handling trade-off, and testing.
  3. Bash: 5%. I write bash scripts to do testing.

Link to Codes

The core codes are open sourced on GitHub.