Qr code
Login 中文
Li Li

Supervisor of Doctorate Candidates

Supervisor of Master's Candidates


E-Mail:

Administrative Position:Professor

Business Address:West Campus of USTC

Alma Mater:University of Science and Technology of China

Discipline:Information and Communication Engineering

Click:Times

The Last Update Time: ..

Current position: Home >> Challenge
Custom columns

The 3rd practical end-to-end image/video compression challenge

 

1. Challenge motivation and description

End-to-end image/video compression has been a research focus for both academia and industry for over seven years. A number of technologies have been developed such as auto-encoder neural networks, probability estimation neural networks, and conditional end-to-end video coding framework and so on. Until recently, the performances of both end-to-end image and video compression schemes have surpassed that of the H.266/Versatile Video Coding (VVC) under certain test conditions. To promote its practical use, we think it is time to consider the complexities of the end-to-end image/video compression schemes, especially the decoding complexities.

This challenge calls for novel end-to-end image/video compression algorithms which can result in a good R-D performance under certain complexity constraints. Last year, we will set a proper weight in the quality metric to balance the performance and decoding complexity for both the end-to-end image and video compression tracks. In addition to that, we add another track to constrain the kmac/pixel for more hardware-friendly solutions. Furthermore, another important feature for practical end-to-end image/video compression solutions is the cross-platform consistency. Therefore, for this year, it is needed for the submitted bitstream to be decoded and reconstructed successfully by the platform provided by the organizers.

The participates are required to compress all images/videos defined in the Test Dataset. In the end-to-end image compression track, the actual bits per pixel (bpp) is not allowed to exceed a target bpp, which is set to the bpp of the test image coded by BPG using the quantization parameter 28. In the end-to-end video compression track, the actual bitrate is not allowed to exceed a target bitrate (kbps), which is set to the bitrate of the test sequence coded by VTM using the quantization parameter 27 under the random access configuration.

In the following, the detailed information of each track will be further provided.

 

2. End-to-end image compression track

2.1 Dataset

  • Training and Validation Dataset: A collection of about 1600 high-resolution images will be provided as the training and validation dataset. Participates are free to split the provided images into training and validation dataset. Participates are also free to use some other dataset for training and validation.

  • Test Dataset: 20 images with resolution 4K will be used for the evaluation. All the images will be in RGB color space and PNG file format. These images will be distributed to all the participates before a certain date. Participates are required to compress them within 72 hours.

 

2.2 Evaluation metrics

2.2.1 Track 1

The performance Q will be evaluated by a weighted sum of the delta PSNR and the decoding complexity,

Q = w × ΔPSNR - dTime

where PSNR is calculated using the average PSNR of the R, G, and B components. ΔPSNR is calculated by subtracting the PSNR of BPG from that of the proposed method. dTime is measured by the seconds used for neural network model loading, entropy decoding, and image reconstruction with a GeForce RTX 4090 GPU provided by the organizers. Therefore, it is also required for the methods to be decoded successfully by the GeForce RTX 4090 GPU provided by the organizers. w is set to 1 to achieve a good balance between performance and decoding complexity.

 

2.2.2 Track 2

       The performance Q is evaluated by the PSNR, which is calculated using the average PSNR of the R, G, and B components. The decoder complexity shall be constrained by 100kmac/pixel.

 

2.3 Submission requirements

  • The participates are requested to submit a decoder along with a docker environment and the corresponding script which can run the decoder.

  • The participates are requested to submit the compressed bitstreams. The bitstreams shall be named like I01.bin

  • The participates are requested to submit the decoded images. The decoded images shall be named like I01dec.png

 

3. End-to-end video compression track

3.1 Dataset

  • Training and Validation Dataset: It is recommended to use the UVG and CDVL dataset for training. Participates are free to split the provided videos into training and validation dataset. Participates are also free to use some other dataset for training and validation.

  • Test Dataset: 10 video sequences in the resolution of 1080p will be used for evaluation. Each sequence contains 96 frames. All the sequences will be in YUV 4:2:0 color space. These video sequences will be distributed to all participates before a certain date. Participates are required to compress them within 72 hours.

 

3.2 Evaluation metrics

The decoded video sequences will be evaluated in YUV 4:2:0 color space. The weighted average PSNR = ( 6 * PSNRY + PSNRU + PSNRV)/8 of the Y, U, and V components will be used to evaluate the distortion of the decoded video sequences. An anchor of VTM-17.0 coded with QP = 27 under random access configuration defined in the VTM common test conditions (encoder_randomaccess_vtm.cfg) will be provided. The actual bitrate (kbps) of the bitstream of each video sequence is not allowed to exceed the target kbps of the test video coded by the anchor. The intra period in the proposed submission shall be no larger than that used by the anchor.

3.2.1 Track 1

The performance Q will be evaluated by a weighted sum of the delta PSNR and the decoding complexity,

Q = w × ΔPSNR - dTime

where ΔPSNR is calculated by subtracting the PSNR of VTM from that of the proposed method. dTime is measured by the seconds used for both entropy decoding and video reconstruction with a GeForce RTX 4090 GPU provided by the organizers. dTime is measured by the average time per frame for each video. Therefore, it is also required for the methods to be decoded successfully by the GeForce RTX 4090 GPU provided by the organizers. w is set to 1 to provide a good balance between the complexity and the performance.

3.2.2 Track 2

The performance Q is evaluated by the weighted PSNR of the Y, U, and V components, PSNR = ( 6 * PSNRY + PSNRU + PSNRV)/8. The decoder complexity shall be constrained by 100kmac/pixel.

 

3.3 Submission requirements

  • The participates are requested to submit a decoder along with a docker environment and the corresponding script which can run the decoder.

  • The participates are requested to submit the compressed bitstreams. The bitstreams shall be named like V01.bin

  • The participates are requested to submit the decoded video sequences. The decoded video sequences shall be named like V01dec.yuv

 

4. Deadlines

  • Jul. 1: registration for the competition. The authors can send the team’s name, team members, and the institution to lil1@ustc.tsg211.com or cmjia@pku.edu.cn for registration

  • Jul. 1: release of the training and validation dataset

  • Jul. 31: deadline of the challenge paper submission

  • Aug. 10: notification of the challenge paper acceptance

  • Aug. 15: submission of the camera-ready paper

  • Sept. 1: submission of the decoder and docker environment

  • Sept. 2: release of the test Dataset

  • Sept. 6: submission of the compressed bitstreams and decoded images/videos.

  • Sept. 15: winners and leader boards notification.

  • Oct. 2-4: challenge session at the MMSP 2024 conference. The winners will receive winner certificates provided by the MMSP organization committee. Selected teams will be invited to present at the conference.

 

5.  Organizers

 

6.  Sponsorship

This challenge is sponsored by the Shanghai Shuangshen Information Technology Co., Ltd (ATTRSense) with a sponsorship of $500 for the winner of each track. ATTRSense is a company targeting “AI for Codec and Codec for AI”. Here is a brief introduction about ATTRSense:

Founded in June 2020, Shanghai Shuangshen Information Technology Co., Ltd is dedicated to revolutionizing traditional image and video codec technology with AI technology. They aim to provide compression products and solutions ranging from algorithms to chip levels for various industries such as security, power grid, internet, healthcare, and metaverse. Their solutions address the challenges of transmitting, storing, processing, and analyzing large volumes of unstructured data.

More than 80% of the company's personnel are dedicated to research and development. They have recruited talents from top universities both in China and abroad, such as Peking University, Zhejiang University, Shanghai Jiaotong University, University of Science and Technology of China, Fudan University, and University of Michigan. The company has also developed ANF, a self-developed image codec that is the world's first AI end-to-end codec for mobile terminals. The codec can be used for real-time coding and has excellent compression performance.


7. End-to-end image/video compression results for MMSP 2024

The winners of each team are announced as follows:

  • Image compression (track 1) winner: USTC-iVC

  • Image compression (track 2) winner: USTC-iVC

  • Video compression (track 1) winner: USTC-iVC

  • Video compression (track 2) winner: BVI-VC