five

BandwidthEstimationDataset

收藏
魔搭社区2025-11-30 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/ByteDance/BandwidthEstimationDataset
下载链接
链接失效反馈
官方服务:
资源简介:
This repo releases a dataset of trajectories for ByteDance Teams audio/video calls. The data is collected from audio/video peer-to-peer ByteDance Teams calls on the Douyin live streaming application. It can be used to study bandwidth estimation (BWE), such as training deep learning models or constructing reward models for training. Each trajectory corresponds to one audio/video call leg and consists of a sequence of: - 51-dimensional observation vector: computed based on packet information received by the client in one audio/video call. - The unique randomly assigned id of the call. - The unique id of the BWE policy used to collect the data of this call. - The bandwidth that the BWE policy predicts for the bottleneck link. - The request rate which denotes the maximum bandwidth to ensure the quality of this call. The observation vector at a time step n encapsulates observed network statistics that characterize the state of the bottleneck link between the sender and receiver over the most recent short term monitor intervals (MI) of 100ms, the most recent median-term MIs of 500ms and the most recent long-term MIs of 1000ms. Specifically, the observation vector tracks 17 different network features over short, median and long term MIs (17 features x (1 short term MI + 1 median term MI + 1 long term MI) = 51). The 17 features and their description are as follows. Features are based on packets received during the short and long term monitor intervals. - Receiving rate: rate at which the client receives data from the sender during a MI, unit: bps. - Number of received packets: total number of packets received in a MI, unit: packet. - Received bytes: total number of bytes received in a MI, unit: Bytes. - Sending rate: rate at which the server sends data from the sender during a MI, unit: bps. - Number of sent packets: total number of packets sent in a MI, unit: packet. - Sent bytes: total number of bytes sent in a MI, unit: Bytes. - Queuing delay: average delay of packets received in a MI minus the minimum packet delay observed so far, unit: ms. - Delay: average delay of packets received in a MI minus a fixed base delay of 200ms, unit: ms. - Minimum seen delay: minimum packet delay observed so far, unit: ms. - Delay ratio: average delay of packets received in a MI divided by the minimum delay of packets received in the same MI, unit: ms/ms. - Delay gradient: indicating the increasing/decreasing trend of the delay. - Delay average minimum difference: average delay of packets received in a MI minus the minimum delay of packets received in the same MI, unit: ms. - Packet interarrival time: mean interarrival time of packets received in a MI, unit: ms. - Packet jitter: standard deviation of interarrival time of packets received in a MI, unit: ms. - Packet loss ratio: probability of packet loss in a MI, unit: packet/packet. - Average number of lost packets: average number of lost packets given a loss occurs, unit: packet. - RTT: the rtt in a MI. - Packet loss rate: the packet loss rate in a MI. **Emulated Dataset:** This repo will also release a dataset from 30000+ emulated test calls which contains ground truth information about the bottleneck link between the sender and receiver, namely, bottleneck capacity, in addition to the aforementioned data. In this dataset, the characteristics of the bottleneck, namely ground truth capacity, is randomly varied throughout the duration of the test call to generate a diverse set of trajectories with network dynamics that may not occur in the real world but are nevertheless important to enhance state-action space coverage and aid in learning generalizable BWE policies.

本仓库发布了适用于字节跳动(ByteDance)Teams音视频通话的轨迹数据集。该数据采集自抖音(Douyin)直播应用内的字节跳动Teams点对点音视频通话。 该数据集可用于研究带宽估计(Bandwidth Estimation, BWE),例如训练深度学习模型或构建用于训练的奖励模型。 每条轨迹对应一条音视频通话链路,由以下序列组成: - 51维观测向量:基于单次音视频通话中客户端接收的数据包信息计算得到。 - 本次通话的唯一随机分配ID。 - 用于采集本次通话数据的带宽估计(BWE)策略的唯一ID。 - 带宽估计策略针对瓶颈链路预测的带宽值。 - 请求速率:即保障本次通话质量所需的最大带宽。 时间步n处的观测向量包含了近100ms短期监控间隔(Monitor Interval, MI)、近500ms中期监控间隔、近1000ms长期监控间隔内,表征发送端与接收端之间瓶颈链路状态的观测网络统计信息。具体而言,观测向量在短、中、长期三类监控间隔下共追踪17种网络特征(17特征 × (1个短期监控间隔 + 1个中期监控间隔 + 1个长期监控间隔) = 51维)。17种特征及其说明如下,这些特征基于短期和长期监控间隔内接收的数据包计算得到: - 接收速率:客户端在单个监控间隔内从发送端接收数据的速率,单位:bps。 - 接收数据包总数:单个监控间隔内接收的数据包总数量,单位:packet。 - 接收字节数:单个监控间隔内接收的总字节数,单位:Bytes。 - 发送速率:服务端在单个监控间隔内向接收端发送数据的速率,单位:bps。 - 发送数据包总数:单个监控间隔内发送的数据包总数量,单位:packet。 - 发送字节数:单个监控间隔内发送的总字节数,单位:Bytes。 - 排队时延:单个监控间隔内接收数据包的平均时延减去截至当前观测到的最小数据包时延,单位:ms。 - 时延:单个监控间隔内接收数据包的平均时延减去固定基准时延200ms,单位:ms。 - 最小观测时延:截至当前观测到的最小数据包时延,单位:ms。 - 时延比:单个监控间隔内接收数据包的平均时延除以该监控间隔内接收数据包的最小时延,单位:ms/ms。 - 时延梯度:表征时延的增减趋势。 - 时延平均最小值差:单个监控间隔内接收数据包的平均时延减去该监控间隔内接收数据包的最小时延,单位:ms。 - 数据包到达间隔均值:单个监控间隔内接收数据包的平均到达间隔时间,单位:ms。 - 数据包抖动:单个监控间隔内接收数据包到达间隔时间的标准差,单位:ms。 - 丢包概率:单个监控间隔内发生丢包的概率,单位:packet/packet。 - 平均丢包数:在发生丢包的情况下,单次监控间隔内的平均丢包数量,单位:packet。 - 往返时延(Round-Trip Time, RTT):单个监控间隔内的往返时延。 - 丢包速率:单个监控间隔内的丢包速率。 **模拟数据集**:本仓库还将发布包含30000余条模拟测试通话的数据集,除上述数据外,该数据集还包含发送端与接收端之间瓶颈链路的真实信息,即瓶颈链路容量。在该数据集中,瓶颈链路的特性(即真实容量)会在测试通话过程中随机变化,以生成多样化的轨迹数据集,其包含的网络动态虽可能未在真实场景中出现,但对于扩充状态-动作空间覆盖范围、助力学习通用化的带宽估计(BWE)策略具有重要价值。
提供机构:
maas
创建时间:
2025-05-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作