This spreadsheet contains all data produced through the tumor cell segmentation project achieved through machine learning and employing Imaris, and contains additional statistics that may be of interest to researchers.

NIAID Data Ecosystem2026-03-14 收录

下载链接：

https://figshare.com/articles/dataset/This_spreadsheet_contains_all_data_produced_through_the_tumor_cell_segmentation_project_achieved_through_machine_learning_and_employing_Imaris_and_contains_additional_statistics_that_may_be_of_interest_to_researchers_/21316839

下载链接

链接失效反馈

官方服务：

资源简介：

Worksheets contain general statistics and fitting information for all treatments, genotypes, patients, and stages we interrogated. Calculation of DNA content (ploidy), the antecedent stromal controls and general statistics for all conditions are included in their own worksheets in this file. Each worksheet’s contents are described below: Tab A. Stromal DNA controls. This worksheet contains DNA-content data for segmented “stromal” (non-tumor) nuclei associated with each Image stack ID, which is directly related to tab B., Tumor Cell Data. Nucleus ID, the ID# given to each nuclei segemented by the Imaris software. Nuclear Area, the surface area of the nuclear segmentation object, an isovolume based on DNA thresholds. Nuclear Channel 2 intensity mean, the total intensity of CK7 IF inside the surface, is expected to be an insignificant value for non-tumor cells. Channel 3 Nuclear sum, the total intensity of DNA IF inside the surface. Nuclear voxels, the number of voxels counted by Imaris software inside the surface. Nuclear volume, the volume of the surface. Channel 3 Background Estimated, the value for any voxel used for background-subtraction for a given image stack. Normal Nuclei BS DNA, the total DNA IF intensity, column F, minus the total of all background inside the surface (voxels x background per voxel). Voxel volume, the local voxel volume reported by Imaris software. Tab B., Tumor Cell Data. Patient deID, the identifier given to each set of paired samples from a patient specimen. Genotype, EGFR+ or KRAS+ genotyping information provided by TTAB. Disease Stage, staging information on the specimen provided by TTAB. Differentiation, description of the pathological classification of differentiation, provided by TTAB. Image Stack ID, the unique filename identifier associated with each image stack. Growth Phenoypes Observed in Stack, related to Fig 4 and discussion, whether Sheet, Solid, Sparse, or giant cell phenotypes were specified as existing within a given image stack. Cell ID, the unique identifier given to each cell model by Imaris. Cytoplasm Channel 2 Intensity Mean, the intensity per voxel on average within the cell body, but not contained within the nuclear model, if present. Cytoplasm Volume, the volume of the cell body surface model, less the volume inside the nuclear model, if present. Cell Ellipticity (Oblate or Prolate), shape parameters provided by Imaris for the given cell surface model. Cell Sphericity, shape parameter provided by Imaris for the given cell surface. Cell volume, volume of the cell surface model. Columns P-R and W, similar to cell data, but for nuclear surface models, when present. Nuclear Channel 1–2 Intensity Means, control data showing average proSPC, Lamin A+C, or CK7 intensity contained within the nucleus. Channel 3 Nuclear Intensity Sum, total DNA IF signal contained within the nuclear surface model. Grey fields, multiple nuclei. Nuclear voxels, the number of voxels counted by Imaris within the nuclear surface model. Nuclear volume, volume of the nuclear surface model. # Nuclei, number of individual nuclei identified as present within the model as reported by Imaris. Channel 3 Background, the DNA IF designated as background for the stack and subtracted before DNA quantification. BS Nuclear DNA, calculation of the total DNA present inside the nuclear model, if present. Normal Mode of Gaussian Fit, related directly to tab A. for the same stack file ID, the center of the gaussian peak fit to the DNA intensities of all non-tumor cells segmented in the same image stack. DNA number, the ratio of the total background-subtracted DNA signal inside the nuclear volume to the mode of the total intensities of all the non-tumor controls. Stated another way, how many non-tumor “genomes” worth of DNA signal are measured inside the tumor nuclear volume. Estimated ploidy, twice the number of estimated genomes, assumes two chromosomes per genome equivalent of DNA signal from column AB. Nucleus S:V Ratio, the ratio of the nuclear surface model to the volume contained within that surface is a shape parameter. N:C ratio, the ratio of the nuclear volume to the cytoplasmic volume. Cell S:V, the ratio of cell model surface area to volume contained within that surface is a shape parameter. Cell Area:Nucleus Area, ratio of cell model surface area to nuclear model surface area is relevant to some flux models. Cell volume/ploidy, the ratio of the volume of the cell body model to the estimated ploidy, or an estimate of the amount of cell matter devoted to each theoretical chromosome. Conditional formatting, related to Fig 3, S12 and S13 Figs. Ploidy/Nuclear volume, the number of theoretical chromosomes, or estimated ploidy, per amount of nuclear volume within a nuclear model. Cell or Nucleus prolate:oblate ratio, a parameter we used to estimate the relative degree of elongation versus flattening, related to S15 Fig. Columns AM-AP, explicit counts of sub-, proportional, or supraproportional and near-WT cells, related to Fig 3, S12 and S13 Figs and referenced by tabs D-F. NM, not measured, as in the case of stack 4850T4, where nuclear models were not completed and these cell models are not included in any DNA-based analyses. C. Normal Cell Data. Stack ID, the unique file name identifier for normal sample image stacks, stained with proSPC and CK7. Cell model volumes, CV, nuclear model volumes, NV, and ratios of nuclear to cytoplasmic volumes, NC, binned for all patients. Columns F-J, Collected numbers of samples, related to tab E. Columns N-T, summary statistics for all normal distant control cells segmented (see Methods), cited in the main text for mean and SD. D. Genotype statistics. Columns A-P, numbers of stacks imaged, depth of imaged stacks segmented, and numbers of cells as well as associated data binned by genotype. Statistics are calculated directly from tabs B and E. Columns Q-X, data by stage, related to Fig 3, S12 and S13 Figs for sub-, proportional, and supraproportional cell classification and quantification. E. Patient statistics. Columns A-D., related to columns A-D., tab B. Columns E-J, reporting of imaging and segmentation sampling depth. Columns Q-AB, left-to-right cell formulae walk through the decision rule we used to classify patients according to the degree of proportionality of cell volume to DNA. Columns Q-S, uniform split of cells between three bins, used for Chi-square analysis, column W, relative to columns T- V. Column X, binomial “coin-flipping” test to determine whether, if the distribution from columns T-V was uneven, one or the other category was significantly enriched. Column Y, decision from statistical Chi-square and binomial tests. Abnormal, in this context not like a fusion of wild-type cells. Proportional, more similar to a fusion of wild-type cells. Columns Z and AA, the degree of fold-change for each category Sub- or Supra, relative to the other. Column AB, final classification based on the decision rule, see Methods. F. Stage Statistics. Genotype, whether normal, EGFR+, or KRAS+ as indicated. Stage, 0 = normal, 1, 2, 3, binnings of stages, where 1 contains 1A and 1B (related to Tab B). Number of cell models, counts of cell models in each genotype and stage, calculated directly from tabs B and C. Fold change mean TCV/mean NCV, ratio of the mean tumor cell volume to mean normal cell volume for each treatment. Fold change SE, standard error of the fold change, estimated from a Taylor approximation for two variables (see Methods). Columns G- AA, summary statistics, related to Tabs B-C. Columns AN-AU, calculation of relative contributions of each cell category, sub-, proportional, or supraproportional, calculated from Tab B (see formulas). G., Model statistics and fitting. This tab contains summary statistics as reported by Igor Pro software for each parameter, treatment, and ploidy type (all cells are all ploidies, euploids only = 1.6–4.4n estimated ploidy), and relevant to quantitation derived from data in Tabs B-C and from Imaris software outputs. Columns A-D, parameter of interest, treatments, and relevant species. # of models, the number of models included in the analysis. Max, the maximum value of the parameter in the analysis. IQR, the interquartile range for the data used in the analysis. Bin width, specified bin width for a histogram of data, based on the Freedman-Diaconis rule. Number of bins, the number of histogram bins used for plotting, derived from column H, and also the sample size used for distribution fitting. Columns J-L, summary statistics for datasets reported by Igor Pro. Columns M-AA, details on initial guesses, fitting function attempted, rationale for choice of best fit (column T), fitting function chosen (column S and green fields are relevant), and best-fit parameters for the chosen fit with their reported errors (columns U-Z). See also, Methods for the form of the fitting function. Cells V56-W57 show the source of the normal 2n peak values used for comparison to tumor cells in the main text, e.g., 582 is the center of the first gaussian from the mixture model used to fit normal AT2 cell data, cell V56, and 179 was the width1 parameter, which is . H. Growth phenotypes observed. Explicit quantitation of the frequency with which segmented stacks contained the specified growth patterns described in Fig 4 and the main text. Related to Tab B.r. I. CK7 in cells > 2 pL. Cited in the main text, and relevant to the drop in high-intensity CK7 staining seen in very large tumor cells, related directly to S25 Fig. Columns A-B, data copied from tab B for all rows containing cell and nuclear models. Columns C-F, counts of the respective categories of cells with greater than or less than half-max cytoplasmic CK7 intensity (127.5 = 1/2 x 255 8-bit units, see formulas), or cells with greater than or less than 2 pL volume (2000 fL = 2000 μm3). Columns H-N, quantitation of the percentages reflected in each region of S25 Fig and cited in the main text. (XLSX)

创建时间：

2022-10-06