Weekly Progress Report Thread: Enhanced Media Experience with AI-Powered Commercial Detection and Replacement

Week 7 Updates:

Blockers:
I got a number of errors while compiling the model. I have shared them in the discord channel thread of the project.

Next Week Goals:

  • Getting the compilation done(hopefully, I will get it done by this week)
  • Inferencing on bbai64 using inbuilt C7x dsps and MMA accelerator.

Minutes of Meeting(27-07-2024)

Attendees:

Key Points:

  • I gave updates on how I just completed the compilation process and will do inferencing on BeagleBone AI-64 next.
  • Discussed with @lorforlinux about writing a comprehensive example on importing pre-trained custom models for inference on the BBAI64.
  • Discussed input pipeline with mentors.
  • @KumarAbhishek suggested designing the input pipeline in a way that, with minor adjustments, it could be used with other models as well.
  • Mentors addressed some general queries I had.
  • We discussed the feature extraction process for input videos, noting that it could add extra time to the overall system.
  • @lorforlinux suggested that I can use gitlab runner if I face any further problems on system compatibility. Also suggested to build ci for BBAI64 code.
  • @KumarAbhishek recommended using the same repository for BBAI64 code and see if optimizing it is possible, so that when commits are made in the BBAI64 directory, only the specific part of the CI pipeline targeting the BBAI64 code runs, without affecting the already completed notebooks.

Overall, the meeting was very insightful.

Week 8 Updates:

  • Compiled the model and generated the necessary artifacts for inferencing on the BeagleBone AI-64.
  • Transferred the model, artifacts folder, etc., to the BeagleBone AI-64 via SSH.Directory structure:
β”œβ”€β”€ Model
β”‚   └── cnn_model.tflite
β”œβ”€β”€ artifacts
β”‚   β”œβ”€β”€ 26_tidl_io_1.bin
β”‚   β”œβ”€β”€ 26_tidl_net.bin
β”‚   β”œβ”€β”€ allowedNode.txt
β”‚   β”œβ”€β”€ cnn_model.tflite
β”‚   β”œβ”€β”€ param.yaml
β”‚   └── tempDir
β”‚       β”œβ”€β”€ 26_calib_raw_data.bin
β”‚       β”œβ”€β”€ 26_tidl_io_.perf_sim_config.txt
β”‚       β”œβ”€β”€ 26_tidl_io_.qunat_stats_config.txt
β”‚       β”œβ”€β”€ 26_tidl_io_1.bin
β”‚       β”œβ”€β”€ 26_tidl_io__LayerPerChannelMean.bin
β”‚       β”œβ”€β”€ 26_tidl_io__stats_tool_out.bin
β”‚       β”œβ”€β”€ 26_tidl_net
β”‚       β”‚   β”œβ”€β”€ bufinfolog.csv
β”‚       β”‚   β”œβ”€β”€ bufinfolog.txt
β”‚       β”‚   └── perfSimInfo.bin
β”‚       β”œβ”€β”€ 26_tidl_net.bin
β”‚       β”œβ”€β”€ 26_tidl_net.bin.layer_info.txt
β”‚       β”œβ”€β”€ 26_tidl_net.bin.svg
β”‚       β”œβ”€β”€ 26_tidl_net.bin_netLog.txt
β”‚       β”œβ”€β”€ 26_tidl_net.bin_paramDebug.csv
β”‚       β”œβ”€β”€ graphvizInfo.txt
β”‚       └── runtimes_visualization.svg
β”œβ”€β”€ environment.yml
β”œβ”€β”€ inferencing
β”‚   └── tflite_model_inferencing.ipynb
└── test data
    β”œβ”€β”€ X_test.npy
    └── y_test.npy
  • Due to insufficient eMMC storage (16GB), flashed bbai64-debian-11.8-xfce-edgeai-arm64-2023-10-07-10gb.img.xz image on a 32GB SD card. After setting up the inferencing folder, 13GB remained available.
  • Conducted inferencing using onboard CPUs.
  • Attempted inferencing with the libtidl_tfl_delegate library.
    Inferencing code:
tflite_model_path = '../Model/cnn_model.tflite'
artifacts_folder_path = '../artifacts_folder'

# Load TIDL delegate
tidl_delegate = tf.lite.experimental.load_delegate('/lib/usr/tidl_delegate.so', options={"artifacts_folder": artifacts_folder_path})
interpreter = tf.lite.Interpreter(model_path=tflite_model_path, experimental_delegates=[tidl_delegate])
# interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
interpreter.allocate_tensors()

# Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

Encountered a kernel restart error, resulting in the BeagleBone AI-64 turning off. It is now unable to boot, with the LEDs flickering as shown

Next Week Goals:

  • Resolve the issue with the BeagleBone AI-64 and perform inferencing using the inbuilt C7x DSPs and MMA accelerator.
  • Collect dataset from the Set-Top Box.
  • Research methods to compress videos into the desired format for the model’s input.
  • Add an example of custom model compilation and inferencing to the BeagleBone AI-64 documentation.
  • Write the Week 6-7 blog.

We are running NVMe on the ai-64 and it is stable. If memory is correct it will be at least 500-600 mbs rate. SD is maybe 100 on a good day.

1 Like

I hope you and others can figure this one out.

I have followed slowly to figure out what I did in the past versus what I can currently do to current processors on specific boards (BBAI-64).

I once set up a tfLite instance for armhf on the BBAI. It was ultimately successful. The build was calculating the pixels per a specific area of the photo to β€œguess” what it was that the build was seeing…

In Louisiana, we get some odd birds. That bird pictured is not quite the bird in the build β€œguess.”

It is close but I am still tracking down exactly what type of bird this pictured bird is actually. At first, I thought Egret of sorts. It looks misshapen and odd for an Egret. So, I moved on guessing it was an Egret.

Anyway, keep up the good work.

Seth

P.S. I have not dabbled in the AI Universe lately, i.e. as I still have a lot to learn and my motors cannot wait (as usual). Thank you for pushing yourself to your limits in your build. I cannot wait to see the final outcome!

I am with you. I was using the 16GB eMMC due to motor movement and stability.

I just ran out of space! Yikes, good luck and Godspeed.

I am going to try to get a 32GB micro SD Card like you did.

Seth

P.S. I am using an llvm-raw build with bazel right now to make the build successful. OpenJDK is not so easy on specific builds. Off to the loops!

1 Like

Have you tried to pass all the required options to the delegate? TI Deep Learning Library User Guide: TFLite Runtime + TIDL Heterogeneous Execution

1 Like

Considering the substantial speed difference, NVMe would be a much better choice for real-time processing than an SD card.

Hi Seth,

Thanks for the encouragement and your kind words! The bird identification project sounds like a fascinating challenge. Identifying birds can be tricky, especially with variations in appearance.

I look forward to hearing more from you as you continue exploring AI.

Regards,
Aryan

1 Like

Hi @Illia_Pikin
Yes, I passed all the required options: β€˜artifacts_folder’, β€˜tidl_tools_path’, β€˜import’ but still I am getting kernel died error.

import tensorflow as tf
tflite_model_path = '../Model/cnn_model.tflite'
delegate_lib_path = '/usr/lib/libtidl_tfl_delegate.so'
delegate_options = {
    'artifacts_folder': '../artifacts_folder',
    'tidl_tools_path': '/usr/lib/',
    'import': 'no'
}
try:
    print('1')
    tidl_delegate = tf.lite.experimental.load_delegate(delegate_lib_path, delegate_options)
    print(tidl_delegate)
except Exception as e:
    print("An error occurred:", str(e))
if tidl_delegate:
    print("I am inside")
    interpreter = tf.lite.Interpreter(model_path=tflite_model_path, experimental_delegates=[tidl_delegate])
else: 
    print("I didn't go inside")
    interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
interpreter.allocate_tensors()
    
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
    
print("Input details:", input_details)
print("Output details:", output_details)

After debugging a bit, I found out that the delegate is getting loaded correctly but kernel died error is encountered while initiating the interpreter with custom delegate.
interpreter is getting intiated with no eror if there is no custom delegate.
I am assuming that the error is becuase of memory going out of bounds(insufficient RAM).

This is the current cnn model which I am using:

def cnn_model():
    model = Sequential()
    model.add(Reshape((150, 1152, 1), input_shape=(150, 1152)))
    model.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
    model.add(MaxPooling2D((2, 2)))

    model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
    model.add(MaxPooling2D((2, 2)))

    model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
    model.add(MaxPooling2D((2, 2)))

    model.add(Flatten())
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
  • Total params: 85060609 (324.48 MB)
  • Trainable params: 85060609 (324.48 MB)
  • Non-trainable params: 0 (0.00 Byte)
    At this point I can only think of minimizing the size of model. Any suggestions?

When you compiled the artifacts - did it complain about something?

If you go to NVMe, set up a fairly large swap file, like 40 gb. You have 4 cores and I am assuming all are being lit up. If you do the swap on SD it might burn it up quickly and all your work evaporates.

No, there were no complaints. Here is the compilation code I used. Do you think I missed anything?

# origin: https://github.com/TexasInstruments/edgeai-tidl-tools/blob/master/examples/osrt_python/tfl/tflrt_delegate.py

import yaml
import json
import shutil
import os
import argparse
import tflite_runtime.interpreter as tflite
import numpy as np
from PIL import Image

parser = argparse.ArgumentParser()
parser.add_argument("-c", "--config", help="Config JSON path", required=True)
parser.add_argument("-d", "--debug_level", default=0, help="Debug Level: 0 - no debug, 1 - rt debug prints, >=2 - increasing levels of debug and trace dump", required=False)
args = parser.parse_args()
os.environ["TIDL_RT_PERFSTATS"] = "1"

with open(args.config) as f:
    config = json.load(f)

required_options = {
    "tidl_tools_path": os.environ["TIDL_TOOLS_PATH"],
    "artifacts_folder": "artifacts",
}
optional_options = {
    "platform": "J7",
    "version": " 7.2",
    "tensor_bits": 8,
    "debug_level": args.debug_level,
    "max_num_subgraphs": 16,
    "deny_list": "",
    "accuracy_level": 1,
    "advanced_options:calibration_frames": 2,
    "advanced_options:calibration_iterations": 5,
    "advanced_options:output_feature_16bit_names_list": "",
    "advanced_options:params_16bit_names_list": "",
    "advanced_options:quantization_scale_type": 0,
    "advanced_options:high_resolution_optimization": 0,
    "advanced_options:pre_batchnorm_fold": 1,
    "ti_internal_nc_flag": 1601,
    "advanced_options:activation_clipping": 1,
    "advanced_options:weight_clipping": 1,
    "advanced_options:bias_calibration": 1,
    "advanced_options:add_data_convert_ops":  0,
    "advanced_options:channel_wise_quantization": 0,
}


def gen_param_yaml(artifacts_folder_path, config):
    layout = "NCHW" if config.get("data_layout") == "NCHW" else "NHWC"

    model_file_name = os.path.basename(config["model_path"])

    dict_file = {
        "task_type": config["model_type"],
        "target_device": "pc",
        "session": {
            "artifacts_folder": "",
            "model_folder": "model",
            "model_path": model_file_name,
            "session_name": config["session_name"],
        },
        "postprocess": {
            "data_layout": layout,
        },
        "preprocess": {
            "data_layout": layout,
            "mean": config["mean"],
            "scale": config["scale"],
        }
    }

    with open(os.path.join(artifacts_folder_path, "param.yaml"), "w") as file:
        yaml.dump(dict_file, file)

    if config["session_name"] in ["tflitert", "onnxrt"]:
        shutil.copy(config["model_path"], os.path.join(artifacts_folder_path, model_file_name))


def infer_image(interpreter, image_files, config):
    input_details = interpreter.get_input_details()
    floating_model = input_details[0]['dtype'] == np.float32
    batch = input_details[0]['shape'][0]
    height = input_details[0]['shape'][1]  # 150
    width = input_details[0]['shape'][2]   # 1152

    # Initialize input_data array with the shape [batch, height, width]
    input_data = np.zeros((batch, height, width), dtype=np.float32)

    # Process calibration arrays
    for i in range(batch):
        img = np.load(image_files[i])  # Load numpy array directly
        if img.shape != (height, width):
            raise ValueError(f"Array {image_files[i]} has shape {img.shape}, expected ({height}, {width})")
        input_data[i] = img

    # Ensure input data type matches the model’s requirement
    if not floating_model:
        input_data = np.uint8(input_data)
        config['mean'] = [0]
        config['scale'] = [1]

    interpreter.resize_tensor_input(input_details[0]['index'], [batch, height, width])
    interpreter.allocate_tensors()
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()



def compose_delegate_options(config):
    delegate_options = {}
    delegate_options.update(required_options)
    delegate_options.update(optional_options)

    if "artifacts_folder" in config:
        delegate_options["artifacts_folder"] = config["artifacts_folder"]
    if "tensor_bits" in config:
        delegate_options["tensor_bits"] = config["tensor_bits"]
    if "deny_list" in config:
        delegate_options["deny_list"] = config["deny_list"]
    if "calibration_iterations" in config:
        delegate_options["advanced_options:calibration_iterations"] = config["calibration_iterations"]
    delegate_options["advanced_options:calibration_frames"] = len(config["calibration_images"])

    if config["model_type"] == "od":
        delegate_options["object_detection:meta_layers_names_list"] = config["meta_layers_names_list"] if (
            "meta_layers_names_list" in config) else ""
        delegate_options["object_detection:meta_arch_type"] = config["meta_arch_type"] if (
            "meta_arch_type" in config) else -1

    if ("object_detection:confidence_threshold" in config and "object_detection:top_k" in config):
        delegate_options["object_detection:confidence_threshold"] = config["object_detection:confidence_threshold"]
        delegate_options["object_detection:top_k"] = config["object_detection:top_k"]

    return delegate_options


def run_model(config):
    print("\nRunning_Model : ", config["model_name"], "\n")

    # set delegate options
    delegate_options = compose_delegate_options(config)

    # delete the contents of this folder
    os.makedirs(delegate_options["artifacts_folder"], exist_ok=True)
    for root, dirs, files in os.walk(delegate_options["artifacts_folder"], topdown=False):
        [os.remove(os.path.join(root, f)) for f in files]
        [os.rmdir(os.path.join(root, d)) for d in dirs]

    calibration_images = config["calibration_images"]
    numFrames = len(calibration_images)

    # set interpreter
    delegate = tflite.load_delegate(os.path.join(
        delegate_options["tidl_tools_path"], "tidl_model_import_tflite.so"), delegate_options)
    interpreter = tflite.Interpreter(
        model_path=config["model_path"], experimental_delegates=[delegate])

    # run interpreter
    for i in range(numFrames):
        start_index = i % len(calibration_images)
        input_details = interpreter.get_input_details()
        batch = input_details[0]["shape"][0]
        input_images = []
        # for batch > 1 input images will be more than one in single input tensor
        for j in range(batch):
            input_images.append(
                calibration_images[(start_index+j) % len(calibration_images)])
        infer_image(interpreter, input_images, config)

    gen_param_yaml(delegate_options["artifacts_folder"], config)
    print("\nCompleted_Model : ", config["model_name"], "\n")

run_model(config)

Gitlab link of code: Link

And the config.json file content:

{
  "model_name": "cnn_model",
  "model_path": "./Model/cnn_model.tflite",
  "calibration_images": [
    "./cal_Data/cal_1.npy",
    "./cal_Data/cal_2.npy",
    "./cal_Data/cal_3.npy",
    "./cal_Data/cal_4.npy"
  ],
  "calibration_iterations": 15,
  "tensor_bits": 16,
  "artifacts_folder": "artifacts",
  "mean": [0, 0, 0],
  "scale": [0.003921568627, 0.003921568627, 0.003921568627],
  "session_name": "tflitert",
  "model_type": "classification"
}

This sounds great and it should work. Before trying that, let me first verify that the issue isn’t with the compilation. If the problem is indeed with the initialization of the interpreter along with the custom delegate, then we can proceed with setting up the NVMe drive.
Thanks for the suggestion, @lorforlinux!

Tested(Link) and the issue is not due to insufficient RAM. Then that means there is some problem with the model compilatioin process.

Week 6-7 Blog: Introduction to edgeai and TIDL

1 Like

…but kernel died error is encountered while initiating the interpreter with custom delegate.

Just a guess, but this could be due to some limitation of the C7x DSP.
One more thing you can try is to set β€œtensor_bits” to 8 in config.json.