five

cosmos-openvid-1m

收藏
魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/fal/cosmos-openvid-1m
下载链接
链接失效反馈
官方服务:
资源简介:
# Cosmos-Tokenized OpenVid-1M [Cosmos-Tokenized](https://github.com/NVIDIA/Cosmos-Tokenizer) [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) # How to use Shards are stored in parquet format. It has 4 columns: `serialized_latent`, `caption`, `fps`, `video`. - `serialized_latent` is the latent vector of the video, serialized using `torch.save()`. Please use the following function to deserialize it: ```python def deserialize_tensor( serialized_tensor: bytes, device: Optional[str] = None ) -> torch.Tensor: return torch.load( io.BytesIO(serialized_tensor), weights_only=True, map_location=torch.device(device) if device else None, ) ``` - `caption` is the caption of the video. - `fps` is the fps of the video. - `video` is the name of the video, you can find the original video at [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) dataset. Example code to read the shards: ```python import io import json from typing import Optional import torch import pandas as pd def read_shards(type: str, split: str): # type: "discrete" or "continuous" index = json.load(open(f"{type}/{split}/index.json")) for shard in index["shards"]: shard_name = shard["raw_data"]["basename"] yield pd.read_parquet(f"{type}/{split}/{shard_name}") def deserialize_tensor( serialized_tensor: bytes, device: Optional[str] = None ) -> torch.Tensor: return torch.load( io.BytesIO(serialized_tensor), weights_only=True, map_location=torch.device(device) if device else None, ) for shard in read_shards("discrete", "train"): for i, row in shard.iterrows(): latent = deserialize_tensor(row["serialized_latent"]) caption = row["caption"] fps = row["fps"] print(latent.shape) print(caption) print(fps) ``` To decode, you would need to install cosmos tokenizer. ```bash git clone https://github.com/NVIDIA/Cosmos-Tokenizer.git cd Cosmos-Tokenizer apt-get install -y ffmpeg pip install -e . ``` Download pretrained checkpoints. ```python from huggingface_hub import login, snapshot_download def download_pretrained_ckpts(local_dir: str, model_name: str): """Download pretrained checkpoints from huggingface.""" login() os.makedirs(local_dir, exist_ok=True) snapshot_download(repo_id=f"nvidia/{model_name}", local_dir=local_dir) ``` Refer to the below code for getting the decoder. ```python from cosmos_tokenizer.video_lib import CausalVideoTokenizer def get_decoder(model_name: str = "Cosmos-Tokenizer-DV4x8x8"): """Get the decoder for the given model name. model_name can be "Cosmos-Tokenizer-DV4x8x8", "Cosmos-Tokenizer-DV8x8x8", or "Cosmos-Tokenizer-DV8x16x16".""" local_dir = f"./pretrained_ckpts/{model_name}" if not os.path.exists(local_dir): download_pretrained_ckpts(local_dir, model_name) decoder = CausalVideoTokenizer(checkpoint_dec=f"{local_dir}/decoder.jit") return decoder ``` You need to unclamp the video to get it in range [0..255]. Decoded video is in range [-1,1]. ```python import torch import numpy as np _UINT8_MAX_F = float(torch.iinfo(torch.uint8).max) def unclamp_video(input_tensor: torch.Tensor) -> torch.Tensor: """Unclamps tensor in [-1,1] to video(dtype=np.uint8) in range [0..255].""" tensor = (input_tensor.float() + 1.0) / 2.0 tensor = tensor.clamp(0, 1).cpu().numpy() return (tensor * _UINT8_MAX_F + 0.5).astype(np.uint8) ``` Example code to decode and save the video with its caption. ```python from torchvision.io import write_video output_dir = "./output" decoder = get_decoder() for shard in read_shards("discrete", "train"): for i, row in shard.iterrows(): latent = deserialize_tensor(row["serialized_latent"]) caption = row["caption"] fps = row["fps"] # Squeeze/unsqueeze because the decoder expects a batch of videos. decoded_video = decoder.decode(latent.unsqueeze(0)).squeeze(0) # [C, T, H, W] -> [T, H, W, C] video = decoded_video.permute(1, 2, 3, 0) # Unclamp the video to get it in range [0..255]. video = unclamp_video(video) # Write the video to disk. write_video(os.path.join(output_dir, f"{i:09d}.mp4"), video, fps=fps) # Write the caption to disk. with open(os.path.join(output_dir, f"{i:09d}.json"), "w") as f: json.dump({"caption": caption, "fps": fps}, f) ```

# 经过Cosmos Tokenizer标记化的OpenVid-1M(Cosmos-Tokenized OpenVid-1M) [Cosmos-Tokenized](https://github.com/NVIDIA/Cosmos-Tokenizer) [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) # 使用方法 数据分片以Parquet格式存储。 该数据集包含4个字段:`serialized_latent`(序列化潜在向量)、`caption`(视频字幕)、`fps`(帧率)、`video`(视频文件名)。 - `serialized_latent`:视频的潜在向量(latent vector),通过`torch.save()`进行序列化。请使用以下函数对其进行反序列化: python def deserialize_tensor( serialized_tensor: bytes, device: Optional[str] = None ) -> torch.Tensor: return torch.load( io.BytesIO(serialized_tensor), weights_only=True, map_location=torch.device(device) if device else None, ) - `caption`:视频的字幕描述。 - `fps`:视频的帧率,此处保留通用缩写fps。 - `video`:视频文件名,可在[OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M)数据集的原始版本中找到对应视频。 读取数据分片的示例代码: python import io import json from typing import Optional import torch import pandas as pd def read_shards(type: str, split: str): # type: "discrete" or "continuous" index = json.load(open(f"{type}/{split}/index.json")) for shard in index["shards"]: shard_name = shard["raw_data"]["basename"] yield pd.read_parquet(f"{type}/{split}/{shard_name}") def deserialize_tensor( serialized_tensor: bytes, device: Optional[str] = None ) -> torch.Tensor: return torch.load( io.BytesIO(serialized_tensor), weights_only=True, map_location=torch.device(device) if device else None, ) for shard in read_shards("discrete", "train"): for i, row in shard.iterrows(): latent = deserialize_tensor(row["serialized_latent"]) caption = row["caption"] fps = row["fps"] print(latent.shape) print(caption) print(fps) 若要对视频进行解码,需先安装Cosmos Tokenizer。 bash git clone https://github.com/NVIDIA/Cosmos-Tokenizer.git cd Cosmos-Tokenizer apt-get install -y ffmpeg pip install -e . 下载预训练检查点: python from huggingface_hub import login, snapshot_download def download_pretrained_ckpts(local_dir: str, model_name: str): """从Hugging Face下载预训练检查点。""" login() os.makedirs(local_dir, exist_ok=True) snapshot_download(repo_id=f"nvidia/{model_name}", local_dir=local_dir) 可参考以下代码获取解码器: python from cosmos_tokenizer.video_lib import CausalVideoTokenizer def get_decoder(model_name: str = "Cosmos-Tokenizer-DV4x8x8"): """根据指定的模型名称获取解码器。 可选的model_name包括:"Cosmos-Tokenizer-DV4x8x8"、"Cosmos-Tokenizer-DV8x8x8"或"Cosmos-Tokenizer-DV8x16x16"。""" local_dir = f"./pretrained_ckpts/{model_name}" if not os.path.exists(local_dir): download_pretrained_ckpts(local_dir, model_name) decoder = CausalVideoTokenizer(checkpoint_dec=f"{local_dir}/decoder.jit") return decoder 解码后的视频张量范围为[-1, 1],需对其进行反归一化处理以转换至[0, 255]的8位无符号整数范围。 python import torch import numpy as np _UINT8_MAX_F = float(torch.iinfo(torch.uint8).max) def unclamp_video(input_tensor: torch.Tensor) -> torch.Tensor: """将范围为[-1, 1]的张量反归一化,转换为范围[0..255]的np.uint8类型视频数据。""" tensor = (input_tensor.float() + 1.0) / 2.0 tensor = tensor.clamp(0, 1).cpu().numpy() return (tensor * _UINT8_MAX_F + 0.5).astype(np.uint8) 解码视频并保存其字幕的示例代码: python from torchvision.io import write_video output_dir = "./output" decoder = get_decoder() for shard in read_shards("discrete", "train"): for i, row in shard.iterrows(): latent = deserialize_tensor(row["serialized_latent"]) caption = row["caption"] fps = row["fps"] # 由于解码器接收批量视频输入,需对张量进行压缩/扩维操作 decoded_video = decoder.decode(latent.unsqueeze(0)).squeeze(0) # 维度重排:[C, T, H, W] -> [T, H, W, C] video = decoded_video.permute(1, 2, 3, 0) # 将视频张量反归一化至[0..255]范围 video = unclamp_video(video) # 将视频写入磁盘 write_video(os.path.join(output_dir, f"{i:09d}.mp4"), video, fps=fps) # 将字幕信息写入磁盘 with open(os.path.join(output_dir, f"{i:09d}.json"), "w") as f: json.dump({"caption": caption, "fps": fps}, f)
提供机构:
maas
创建时间:
2025-07-07
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集为Cosmos-Tokenized OpenVid-1M,以parquet格式存储,包含序列化潜向量、标题、fps和视频名称四列,用于从潜向量解码还原视频。它基于Apache 2.0许可证,提供了从数据读取到视频生成的具体使用指南。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作