five

Neufert 4.0

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14223941
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Provenance Based on the Swiss Dwellings Dataset by Archilyse | https://doi.org/10.5281/zenodo.7788422 Matthias Standfest, Michael Franzen, Yvonne Schröder, Luis Gonzalez Medina, Yarilo Villanueva Hernandez, Jan Hendrik Buck, Yen-Ling Tan, Milena Niedzwiecka, & Rachele Colmegna. (2022). Swiss Dwellings: A large dataset of apartment models including aggregated geolocation-based simulation results covering viewshed, natural light, traffic noise, centrality and geometric analysis (3.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7788422     Codebook "geometries_w_outlines.csv" contains the geometries of n = 20,419 residential apartments (see https://zenodo.org/records/7788422) as well as outlines of these apartments (entity_subtype = "APARTMENT", entity_type = "outline"). For details of data processing and cleaning, see sections below. label description apartment_id The ID of the apartment (for features, areas), note: an apartment id is only unique per site area_id The ID of the area in which the element is spatially contained (for features) building_id The ID of the building entity_subtype The entity’s sub-type (e.g. APARTMENT, WALL) entity_type The entity type (outline, area, separator, opening, feature) floor_id The ID of the floor geometry The element’s geometry as a WKT geometry in meters. The geometry is given in the site’s local coordinate system. I.e. the position between elements of the same site are correct in respect to each other. The +y direction points northwards, the +x direction points eastwards. site_id The ID of the site unit_id The ID of the unit in which the element is spatially contained (for features, areas) bfa A unique apartment ID created as a concatenation of [building_floor_apartment] IDs walls A string representing whether a given line segment of the apartment outline geometry represents an outer (O) or inner (I) wall, e.g. ((OOIIO)) for an outline with 5 line segments. In case of acomplex outline (e.g., house with inner atrium), the syntax matches WKT: ((OOOO),(OOOO))     "apartment_simulations.csv" comprises per-apartment features, calculated from the original data hosted at https://zenodo.org/records/7788422. label description No_floor Number of floor the apartment is on total_area Total apartment area in sqm number_of_rooms Number of living-, bed-, lounge- (etc.) rooms room_area_total Total area of the rooms in sqm room_area_ratio Ratio of the room_area_total to the total_area room_area_mean Average of room areas room_area_std Standard deviation of room areas room_sunlight The average amount of direct daylight received by the rooms (klx) room_noise The average of car traffic noise received on thearea’s windows from daytimeand night-time by the rooms (dB(A)) largest_room_area_total Largest room area in sqm largest_room_area_ratio Ratio of the largest_room_area_total to the total_area largest_room_sunlight Amount of direct daylight received by the largest room (klx) largest_room_noise Car traffic noise received on thearea’s windows from daytimeand night-time by the largest room (dB(A)) kitchen_number Number of kitchens kitchen_area_total Total area of the kitchens in sqm kitchen_area_ratio Ratio of the kitchen_area_total to the total_area kitchen_area_mean Average of kitchen areas kitchen_area_std Standard deviation of kitchen areas kitchen_sunlight The average amount of direct daylight received by the kitchens (klx) kitchen_noise The average of car traffic noise received on thearea’s windows from daytimeand night-time by the kitchens (dB(A)) bathroom_number Number of bathrooms bathroom_area_total Total area of the bathrooms in sqm bathroom_area_ratio Ratio of the bathroom_area_total to the total_area bathroom_area_mean Average of bathroom areas bathroom_area_std Standard deviation of bathroom areas bathroom_sunlight The average amount of direct daylight received by the bathrooms (klx) bathroom_noise The average of car traffic noise received on thearea’s windows from daytimeand night-time by the bathrooms (dB(A)) corridor_number Number of corridors corridor_area_total Total area of the corridors in sqm corridor_area_ratio Ratio of the corridor_area_total to the total_area corridor_area_mean Average of corridor areas corridor_area_std Standard deviation of corridor areas corridor_sunlight The average amount of direct daylight received by the corridors (klx) corridor_noise The average of car traffic noise received on thearea’s windows from daytimeand night-time by the corridors (dB(A)) site_id Same as in geometries_w_outlines.csv building_id Same as in geometries_w_outlines.csv floor_id Same as in geometries_w_outlines.csv apartment_id Same as in geometries_w_outlines.csv bfa Same as in geometries_w_outlines.csv     Data processing The original dataset comprised data from a total of N = 37,174 apartments. In the first step, we excluded flats that were either underground or above 6th floor (n = 2,271), had a total are outside of the 25-200 sqm range (n = 1,094), lacked data on room noise (n = 4) or room sunlight (n = 4), or had a value of zero for either of these two variables (n = 810). As only a small number of flats (n = 387) had more than five rooms, these were also excluded. This resulted in a sample size of N = 32,990 apartments. Shape clusters To identify clusters that contain apartments of with the same layout, we devised the following algorithm: First, to account for potential rotation of the apartments, we oriented them along the horizontal (x-axis) and the vertical (y-axis). This was done byt aligning the longest polygon edge within each apartments along the horizontal. Next, we converted the vector graphics representation of the apartments to 224x224 black and white raster representation that contained only area geometries (rooms, corridors, shafts, balconies etc. but not walls, columns, doors, or windows). Each such 224x224 matrix contained only values of 0 (black) and 1 (white). These matrices were then used to compare to get a measure of distance between the apartment geometries. The apartments were first grouped according to number of rooms. Within each group, we then selected a random apartment as a reference matrix ($\mathbf{A}$) and calculated the sum of squared difference between this reference and all other apartnent matrices ($\mathbf{B}$): $$SS = \sum_{i = 1}^{n}\sum_{j = 1}^{n}(\mathbf{B}_{ij} - \mathbf{A}_{ij})^2$$ where $n$ = 224, the number of pixels per dimension. After the initial aligning of the floor plans along the x and y axes, there remained eight possible orientations each floor plan could have. For this reason, each floor plan matrix was transformed into eight different matrices that reflected these rotattions and only then compared to the reference using the sum of squares measure. This resulted in eight distance values per floor plan, yielding an eight-dimensional difference space, with each floorplan representing a point in this space and the reference flat positioned at its origin. In order to further expand this difference space, we selected another floor plan as reference and repeated this process. This floor plan was chosen from the middle of the distance distributions in the first iteration. Thee resulting 16-dimensional data set was then passed to the Self-Organising Map algorithm (SOM, Kohonen, 1990) with a grid size equal to $floor(\sqrt{n})$, where $n$ represents the number of floor plans with a given number of rooms. This procedure yielded grid position for each of the analysed floor plans, clustering them according to shape. The above-described algorithm produced a somewhat liberal clusters, occasionally grouping dissimilar floor plans into a single cluster and so a further step was implemented to refine the clustering. Within each of the SOM clusters, we started by designating the first (position was arbitrary) floor plan as the reference and once again calculated the distance between it and the eight orientations of all other floor plans in the same SOM cluster. If the smallest of the eight distance measures was less than an empirically determined threshold of 2,000, the floorplan in question was added to a cluster defined by the reference floor plan. Next, the first of the remaining floor plans that did not pass the threshold was designated a reference for the next cluster and the process repeated. Once all floor plans withing the same SOM cluster were assorted, the algorithm was reiterated on the next SOM cluster. While the "refinement" algorithm could have been applied to the raw, unclustered data, the benefit of first creating the SOM clusters was a massive reduction of apartment-to-apartment comparisons and thus of computational resources. This procedure yielded 11,173 rotation- and translation-invariant clusters of geometrically identical floor plans. The number of floor plans per cluster ranged from 1 to 96. Filtering observations Once the data was sorted into clusters according to the geometry of floor plans, we removed apartments that were deemed as essentially duplicates with respect to the values of average room noise and average room sunlight. These essential duplicates were identified as follows: First, we discretised the noise variable into categories by 5 dB increments and the sunlight variable by increments of 200 lx.Within each cluster and for each level of noise, we then retained one observation at every given level of the discretised sunlight variable (if available). The retained observation was the one with the lowest score on the original, continuous average room sunlight variable. This further reduced the data set to the final size of n = 20,419 essentially unique apartments.   Automatic outline generation The design heuristics developed within the Neufert 4.0 project offer diverse insights and take into account various parameters relevant to architectural design. However, a critical requirement for these heuristics is the outline of the apartment, which serves as a necessary input for our application. While Archilyse provides vector geometric data for entities such as rooms, walls, doors, and windows, it does not supply the outlines of floors and apartments, making it necessary for us to extract the outlines automatically. Due to missing or erroneous geometric data in the dataset, extracting outlines is not a trivial task. We automatically extract apartment outlines from the existing entity vector geometry using a multi-step geometric computation process. First, we apply an offset to all entity polygons to fill gaps caused by inaccuracies or missing geometric data. Second, we union all offset polygons to obtain the desired outline. Third, the outline is offset back to its original size. However, the presence of extended walls in the original geometric data results in an irregular outline. To address this, we perform a reverse offset on the outline, the extended walls can be cleaned up if their size is smaller than the offset distance. The outline is then offset back to its original size, getting the final outline.   As previously mentioned, the original geometric data is highly complex and often contains missing or erroneous elements, making automatic outline extraction method is not effective for all floor plans. However, obtaining accurate outlines is crucial for subsequent studies. Therefore, the automatically extracted outline needed to be clerically checked. Manual outline extraction While the Archilyse data set contains undoubtedly useful data of high quality, there are instances where the process applied to retrieve floor plan geometries yielded suboptimal results. The abnormalities included walls extending outside of the bounds of the floor plan or geometries labelled as entrances or windows disconnected from room geometries. For this reason, we decided to conduct a clerical audit of the data set and employed three graduate assistants who reviewed the data, flagged problematic geometries, and manually drew apartment outlines. To facilitate the audit process, we developed a web application hosted on the Bauhaus-University Weimar’s server. This application provided the clerical auditors with visualisation, review, and outline drawing tools, making the entire process less laborious and more efficient. The audit procedure was as follows: Upon visiting the URL where the web application was hosted, the reviewers were presented the apartment geometries, always displayed in the context of the floor. The order of presentation was such that the reviewers always reviewed all floor plans of interest on the same floor and within the same building. In case a given floor plan was not to be reviewed—because it had been excluded in the data cleaning process described above—it was automatically skipped. The reviewers’ first task was to visually inspect the presented floor plan and, if something about it appeared wrong, flag it as problematic. Next, the application displayed the automatically generated floor plan outline (see previous section) as a closed polyline. The reviewers were asked to either approve the outline or, if needed edit it. This was done by dragging, adding, or removing the vertices of the polyline. Finally, the auditors highlighted inside walls (those that do not form the facade of the containing building), by clicking on the appropriate line segments. Upon confirming their review, the data were saved and the next floor plan was presented. After a short period of training, the entire process could be completed in under a minute. Overview of floor plan audit process At the end of the review, outlines of all 20,419 floor plans were either approved or amended. Of those floor plans, 268 were flagged as problematic, with 559 geometrical elements (e.g., wall segment, door, space) were flagged as “unusual” and four floor plans were marked as “wrong”. In seven instances, there were geometrical elements missing from the geometry of the entire floor (e.g., apartments not included in the Archilyse dataset).   References T. Kohonen, "The self-organizing map," in Proceedings of the IEEE, vol. 78, no. 9, pp. 1464-1480, Sept. 1990, doi: 10.1109/5.58325.
创建时间:
2024-12-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作