cosmos-openvid-1m

Name: cosmos-openvid-1m
Creator: maas
Published: 2025-12-05 11:57:44
License: 暂无描述

魔搭社区2025-12-05 更新2025-07-12 收录

下载链接：

https://modelscope.cn/datasets/fal/cosmos-openvid-1m

下载链接

链接失效反馈

官方服务：

资源简介：

# Cosmos-Tokenized OpenVid-1M [Cosmos-Tokenized](https://github.com/NVIDIA/Cosmos-Tokenizer) [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) # How to use Shards are stored in parquet format. It has 4 columns: `serialized_latent`, `caption`, `fps`, `video`. - `serialized_latent` is the latent vector of the video, serialized using `torch.save()`. Please use the following function to deserialize it: ```python def deserialize_tensor( serialized_tensor: bytes, device: Optional[str] = None ) -> torch.Tensor: return torch.load( io.BytesIO(serialized_tensor), weights_only=True, map_location=torch.device(device) if device else None, ) ``` - `caption` is the caption of the video. - `fps` is the fps of the video. - `video` is the name of the video, you can find the original video at [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) dataset. Example code to read the shards: ```python import io import json from typing import Optional import torch import pandas as pd def read_shards(type: str, split: str): # type: "discrete" or "continuous" index = json.load(open(f"{type}/{split}/index.json")) for shard in index["shards"]: shard_name = shard["raw_data"]["basename"] yield pd.read_parquet(f"{type}/{split}/{shard_name}") def deserialize_tensor( serialized_tensor: bytes, device: Optional[str] = None ) -> torch.Tensor: return torch.load( io.BytesIO(serialized_tensor), weights_only=True, map_location=torch.device(device) if device else None, ) for shard in read_shards("discrete", "train"): for i, row in shard.iterrows(): latent = deserialize_tensor(row["serialized_latent"]) caption = row["caption"] fps = row["fps"] print(latent.shape) print(caption) print(fps) ``` To decode, you would need to install cosmos tokenizer. ```bash git clone https://github.com/NVIDIA/Cosmos-Tokenizer.git cd Cosmos-Tokenizer apt-get install -y ffmpeg pip install -e . ``` Download pretrained checkpoints. ```python from huggingface_hub import login, snapshot_download def download_pretrained_ckpts(local_dir: str, model_name: str): """Download pretrained checkpoints from huggingface.""" login() os.makedirs(local_dir, exist_ok=True) snapshot_download(repo_id=f"nvidia/{model_name}", local_dir=local_dir) ``` Refer to the below code for getting the decoder. ```python from cosmos_tokenizer.video_lib import CausalVideoTokenizer def get_decoder(model_name: str = "Cosmos-Tokenizer-DV4x8x8"): """Get the decoder for the given model name. model_name can be "Cosmos-Tokenizer-DV4x8x8", "Cosmos-Tokenizer-DV8x8x8", or "Cosmos-Tokenizer-DV8x16x16".""" local_dir = f"./pretrained_ckpts/{model_name}" if not os.path.exists(local_dir): download_pretrained_ckpts(local_dir, model_name) decoder = CausalVideoTokenizer(checkpoint_dec=f"{local_dir}/decoder.jit") return decoder ``` You need to unclamp the video to get it in range [0..255]. Decoded video is in range [-1,1]. ```python import torch import numpy as np _UINT8_MAX_F = float(torch.iinfo(torch.uint8).max) def unclamp_video(input_tensor: torch.Tensor) -> torch.Tensor: """Unclamps tensor in [-1,1] to video(dtype=np.uint8) in range [0..255].""" tensor = (input_tensor.float() + 1.0) / 2.0 tensor = tensor.clamp(0, 1).cpu().numpy() return (tensor * _UINT8_MAX_F + 0.5).astype(np.uint8) ``` Example code to decode and save the video with its caption. ```python from torchvision.io import write_video output_dir = "./output" decoder = get_decoder() for shard in read_shards("discrete", "train"): for i, row in shard.iterrows(): latent = deserialize_tensor(row["serialized_latent"]) caption = row["caption"] fps = row["fps"] # Squeeze/unsqueeze because the decoder expects a batch of videos. decoded_video = decoder.decode(latent.unsqueeze(0)).squeeze(0) # [C, T, H, W] -> [T, H, W, C] video = decoded_video.permute(1, 2, 3, 0) # Unclamp the video to get it in range [0..255]. video = unclamp_video(video) # Write the video to disk. write_video(os.path.join(output_dir, f"{i:09d}.mp4"), video, fps=fps) # Write the caption to disk. with open(os.path.join(output_dir, f"{i:09d}.json"), "w") as f: json.dump({"caption": caption, "fps": fps}, f) ```

# 经过Cosmos Tokenizer标记化的OpenVid-1M（Cosmos-Tokenized OpenVid-1M） [Cosmos-Tokenized](https://github.com/NVIDIA/Cosmos-Tokenizer) [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) # 使用方法数据分片以Parquet格式存储。该数据集包含4个字段：`serialized_latent`（序列化潜在向量）、`caption`（视频字幕）、`fps`（帧率）、`video`（视频文件名）。 - `serialized_latent`：视频的潜在向量（latent vector），通过`torch.save()`进行序列化。请使用以下函数对其进行反序列化： python def deserialize_tensor( serialized_tensor: bytes, device: Optional[str] = None ) -> torch.Tensor: return torch.load( io.BytesIO(serialized_tensor), weights_only=True, map_location=torch.device(device) if device else None, ) - `caption`：视频的字幕描述。 - `fps`：视频的帧率，此处保留通用缩写fps。 - `video`：视频文件名，可在[OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M)数据集的原始版本中找到对应视频。读取数据分片的示例代码： python import io import json from typing import Optional import torch import pandas as pd def read_shards(type: str, split: str): # type: "discrete" or "continuous" index = json.load(open(f"{type}/{split}/index.json")) for shard in index["shards"]: shard_name = shard["raw_data"]["basename"] yield pd.read_parquet(f"{type}/{split}/{shard_name}") def deserialize_tensor( serialized_tensor: bytes, device: Optional[str] = None ) -> torch.Tensor: return torch.load( io.BytesIO(serialized_tensor), weights_only=True, map_location=torch.device(device) if device else None, ) for shard in read_shards("discrete", "train"): for i, row in shard.iterrows(): latent = deserialize_tensor(row["serialized_latent"]) caption = row["caption"] fps = row["fps"] print(latent.shape) print(caption) print(fps) 若要对视频进行解码，需先安装Cosmos Tokenizer。 bash git clone https://github.com/NVIDIA/Cosmos-Tokenizer.git cd Cosmos-Tokenizer apt-get install -y ffmpeg pip install -e . 下载预训练检查点： python from huggingface_hub import login, snapshot_download def download_pretrained_ckpts(local_dir: str, model_name: str): """从Hugging Face下载预训练检查点。""" login() os.makedirs(local_dir, exist_ok=True) snapshot_download(repo_id=f"nvidia/{model_name}", local_dir=local_dir) 可参考以下代码获取解码器： python from cosmos_tokenizer.video_lib import CausalVideoTokenizer def get_decoder(model_name: str = "Cosmos-Tokenizer-DV4x8x8"): """根据指定的模型名称获取解码器。可选的model_name包括："Cosmos-Tokenizer-DV4x8x8"、"Cosmos-Tokenizer-DV8x8x8"或"Cosmos-Tokenizer-DV8x16x16"。""" local_dir = f"./pretrained_ckpts/{model_name}" if not os.path.exists(local_dir): download_pretrained_ckpts(local_dir, model_name) decoder = CausalVideoTokenizer(checkpoint_dec=f"{local_dir}/decoder.jit") return decoder 解码后的视频张量范围为[-1, 1]，需对其进行反归一化处理以转换至[0, 255]的8位无符号整数范围。 python import torch import numpy as np _UINT8_MAX_F = float(torch.iinfo(torch.uint8).max) def unclamp_video(input_tensor: torch.Tensor) -> torch.Tensor: """将范围为[-1, 1]的张量反归一化，转换为范围[0..255]的np.uint8类型视频数据。""" tensor = (input_tensor.float() + 1.0) / 2.0 tensor = tensor.clamp(0, 1).cpu().numpy() return (tensor * _UINT8_MAX_F + 0.5).astype(np.uint8) 解码视频并保存其字幕的示例代码： python from torchvision.io import write_video output_dir = "./output" decoder = get_decoder() for shard in read_shards("discrete", "train"): for i, row in shard.iterrows(): latent = deserialize_tensor(row["serialized_latent"]) caption = row["caption"] fps = row["fps"] # 由于解码器接收批量视频输入，需对张量进行压缩/扩维操作 decoded_video = decoder.decode(latent.unsqueeze(0)).squeeze(0) # 维度重排：[C, T, H, W] -> [T, H, W, C] video = decoded_video.permute(1, 2, 3, 0) # 将视频张量反归一化至[0..255]范围 video = unclamp_video(video) # 将视频写入磁盘 write_video(os.path.join(output_dir, f"{i:09d}.mp4"), video, fps=fps) # 将字幕信息写入磁盘 with open(os.path.join(output_dir, f"{i:09d}.json"), "w") as f: json.dump({"caption": caption, "fps": fps}, f)

提供机构：

maas

创建时间：

2025-07-07

搜集汇总

数据集介绍