Neufert 4.0
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14223941
下载链接
链接失效反馈官方服务:
资源简介:
Dataset
Provenance
Based on the Swiss Dwellings Dataset by Archilyse | https://doi.org/10.5281/zenodo.7788422
Matthias Standfest, Michael Franzen, Yvonne Schröder, Luis Gonzalez Medina, Yarilo Villanueva Hernandez, Jan Hendrik Buck, Yen-Ling Tan, Milena Niedzwiecka, & Rachele Colmegna. (2022). Swiss Dwellings: A large dataset of apartment models including aggregated geolocation-based simulation results covering viewshed, natural light, traffic noise, centrality and geometric analysis (3.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7788422
Codebook
"geometries_w_outlines.csv" contains the geometries of n = 20,419 residential apartments (see https://zenodo.org/records/7788422) as well as outlines of these apartments (entity_subtype = "APARTMENT", entity_type = "outline"). For details of data processing and cleaning, see sections below.
label
description
apartment_id
The ID of the apartment (for features, areas), note: an apartment id is only unique per site
area_id
The ID of the area in which the element is spatially contained (for features)
building_id
The ID of the building
entity_subtype
The entity’s sub-type (e.g. APARTMENT, WALL)
entity_type
The entity type (outline, area, separator, opening, feature)
floor_id
The ID of the floor
geometry
The element’s geometry as a WKT geometry in meters. The geometry is given in the site’s local coordinate system. I.e. the position between elements of the same site are correct in respect to each other. The +y direction points northwards, the +x direction points eastwards.
site_id
The ID of the site
unit_id
The ID of the unit in which the element is spatially contained (for features, areas)
bfa
A unique apartment ID created as a concatenation of [building_floor_apartment] IDs
walls
A string representing whether a given line segment of the apartment outline geometry represents an outer (O) or inner (I) wall, e.g. ((OOIIO)) for an outline with 5 line segments. In case of acomplex outline (e.g., house with inner atrium), the syntax matches WKT: ((OOOO),(OOOO))
"apartment_simulations.csv" comprises per-apartment features, calculated from the original data hosted at https://zenodo.org/records/7788422.
label
description
No_floor
Number of floor the apartment is on
total_area
Total apartment area in sqm
number_of_rooms
Number of living-, bed-, lounge- (etc.) rooms
room_area_total
Total area of the rooms in sqm
room_area_ratio
Ratio of the room_area_total to the total_area
room_area_mean
Average of room areas
room_area_std
Standard deviation of room areas
room_sunlight
The average amount of direct daylight received by the rooms (klx)
room_noise
The average of car traffic noise received on thearea’s windows from daytimeand night-time by the rooms (dB(A))
largest_room_area_total
Largest room area in sqm
largest_room_area_ratio
Ratio of the largest_room_area_total to the total_area
largest_room_sunlight
Amount of direct daylight received by the largest room (klx)
largest_room_noise
Car traffic noise received on thearea’s windows from daytimeand night-time by the largest room (dB(A))
kitchen_number
Number of kitchens
kitchen_area_total
Total area of the kitchens in sqm
kitchen_area_ratio
Ratio of the kitchen_area_total to the total_area
kitchen_area_mean
Average of kitchen areas
kitchen_area_std
Standard deviation of kitchen areas
kitchen_sunlight
The average amount of direct daylight received by the kitchens (klx)
kitchen_noise
The average of car traffic noise received on thearea’s windows from daytimeand night-time by the kitchens (dB(A))
bathroom_number
Number of bathrooms
bathroom_area_total
Total area of the bathrooms in sqm
bathroom_area_ratio
Ratio of the bathroom_area_total to the total_area
bathroom_area_mean
Average of bathroom areas
bathroom_area_std
Standard deviation of bathroom areas
bathroom_sunlight
The average amount of direct daylight received by the bathrooms (klx)
bathroom_noise
The average of car traffic noise received on thearea’s windows from daytimeand night-time by the bathrooms (dB(A))
corridor_number
Number of corridors
corridor_area_total
Total area of the corridors in sqm
corridor_area_ratio
Ratio of the corridor_area_total to the total_area
corridor_area_mean
Average of corridor areas
corridor_area_std
Standard deviation of corridor areas
corridor_sunlight
The average amount of direct daylight received by the corridors (klx)
corridor_noise
The average of car traffic noise received on thearea’s windows from daytimeand night-time by the corridors (dB(A))
site_id
Same as in geometries_w_outlines.csv
building_id
Same as in geometries_w_outlines.csv
floor_id
Same as in geometries_w_outlines.csv
apartment_id
Same as in geometries_w_outlines.csv
bfa
Same as in geometries_w_outlines.csv
Data processing
The original dataset comprised data from a total of N = 37,174 apartments.
In the first step, we excluded flats that were either underground or above 6th floor (n = 2,271), had a total are outside of the 25-200 sqm range (n = 1,094), lacked data on room noise (n = 4) or room sunlight (n = 4), or had a value of zero for either of these two variables (n = 810). As only a small number of flats (n = 387) had more than five rooms, these were also excluded. This resulted in a sample size of N = 32,990 apartments.
Shape clusters
To identify clusters that contain apartments of with the same layout, we devised the following algorithm:
First, to account for potential rotation of the apartments, we oriented them along the horizontal (x-axis) and the vertical (y-axis). This was done byt aligning the longest polygon edge within each apartments along the horizontal.
Next, we converted the vector graphics representation of the apartments to 224x224 black and white raster representation that contained only area geometries (rooms, corridors, shafts, balconies etc. but not walls, columns, doors, or windows). Each such 224x224 matrix contained only values of 0 (black) and 1 (white).
These matrices were then used to compare to get a measure of distance between the apartment geometries. The apartments were first grouped according to number of rooms. Within each group, we then selected a random apartment as a reference matrix ($\mathbf{A}$) and calculated the sum of squared difference between this reference and all other apartnent matrices ($\mathbf{B}$):
$$SS = \sum_{i = 1}^{n}\sum_{j = 1}^{n}(\mathbf{B}_{ij} - \mathbf{A}_{ij})^2$$
where $n$ = 224, the number of pixels per dimension.
After the initial aligning of the floor plans along the x and y axes, there remained eight possible orientations each floor plan could have. For this reason, each floor plan matrix was transformed into eight different matrices that reflected these rotattions and only then compared to the reference using the sum of squares measure. This resulted in eight distance values per floor plan, yielding an eight-dimensional difference space, with each floorplan representing a point in this space and the reference flat positioned at its origin.
In order to further expand this difference space, we selected another floor plan as reference and repeated this process. This floor plan was chosen from the middle of the distance distributions in the first iteration. Thee resulting 16-dimensional data set was then passed to the Self-Organising Map algorithm (SOM, Kohonen, 1990) with a grid size equal to $floor(\sqrt{n})$, where $n$ represents the number of floor plans with a given number of rooms. This procedure yielded grid position for each of the analysed floor plans, clustering them according to shape.
The above-described algorithm produced a somewhat liberal clusters, occasionally grouping dissimilar floor plans into a single cluster and so a further step was implemented to refine the clustering. Within each of the SOM clusters, we started by designating the first (position was arbitrary) floor plan as the reference and once again calculated the distance between it and the eight orientations of all other floor plans in the same SOM cluster. If the smallest of the eight distance measures was less than an empirically determined threshold of 2,000, the floorplan in question was added to a cluster defined by the reference floor plan. Next, the first of the remaining floor plans that did not pass the threshold was designated a reference for the next cluster and the process repeated. Once all floor plans withing the same SOM cluster were assorted, the algorithm was reiterated on the next SOM cluster.
While the "refinement" algorithm could have been applied to the raw, unclustered data, the benefit of first creating the SOM clusters was a massive reduction of apartment-to-apartment comparisons and thus of computational resources.
This procedure yielded 11,173 rotation- and translation-invariant clusters of geometrically identical floor plans. The number of floor plans per cluster ranged from 1 to 96.
Filtering observations
Once the data was sorted into clusters according to the geometry of floor plans, we removed apartments that were deemed as essentially duplicates with respect to the values of average room noise and average room sunlight. These essential duplicates were identified as follows:
First, we discretised the noise variable into categories by 5 dB increments and the sunlight variable by increments of 200 lx.Within each cluster and for each level of noise, we then retained one observation at every given level of the discretised sunlight variable (if available). The retained observation was the one with the lowest score on the original, continuous average room sunlight variable.
This further reduced the data set to the final size of n = 20,419 essentially unique apartments.
Automatic outline generation
The design heuristics developed within the Neufert 4.0 project offer diverse insights and take into account various parameters relevant to architectural design. However, a critical requirement for these heuristics is the outline of the apartment, which serves as a necessary input for our application. While Archilyse provides vector geometric data for entities such as rooms, walls, doors, and windows, it does not supply the outlines of floors and apartments, making it necessary for us to extract the outlines automatically. Due to missing or erroneous geometric data in the dataset, extracting outlines is not a trivial task. We automatically extract apartment outlines from the existing entity vector geometry using a multi-step geometric computation process. First, we apply an offset to all entity polygons to fill gaps caused by inaccuracies or missing geometric data. Second, we union all offset polygons to obtain the desired outline. Third, the outline is offset back to its original size. However, the presence of extended walls in the original geometric data results in an irregular outline. To address this, we perform a reverse offset on the outline, the extended walls can be cleaned up if their size is smaller than the offset distance. The outline is then offset back to its original size, getting the final outline.
As previously mentioned, the original geometric data is highly complex and often contains missing or erroneous elements, making automatic outline extraction method is not effective for all floor plans. However, obtaining accurate outlines is crucial for subsequent studies. Therefore, the automatically extracted outline needed to be clerically checked.
Manual outline extraction
While the Archilyse data set contains undoubtedly useful data of high quality, there are instances where the process applied to retrieve floor plan geometries yielded suboptimal results. The abnormalities included walls extending outside of the bounds of the floor plan or geometries labelled as entrances or windows disconnected from room geometries. For this reason, we decided to conduct a clerical audit of the data set and employed three graduate assistants who reviewed the data, flagged problematic geometries, and manually drew apartment outlines.
To facilitate the audit process, we developed a web application hosted on the Bauhaus-University Weimar’s server. This application provided the clerical auditors with visualisation, review, and outline drawing tools, making the entire process less laborious and more efficient.
The audit procedure was as follows: Upon visiting the URL where the web application was hosted, the reviewers were presented the apartment geometries, always displayed in the context of the floor. The order of presentation was such that the reviewers always reviewed all floor plans of interest on the same floor and within the same building. In case a given floor plan was not to be reviewed—because it had been excluded in the data cleaning process described above—it was automatically skipped.
The reviewers’ first task was to visually inspect the presented floor plan and, if something about it appeared wrong, flag it as problematic. Next, the application displayed the automatically generated floor plan outline (see previous section) as a closed polyline. The reviewers were asked to either approve the outline or, if needed edit it. This was done by dragging, adding, or removing the vertices of the polyline. Finally, the auditors highlighted inside walls (those that do not form the facade of the containing building), by clicking on the appropriate line segments. Upon confirming their review, the data were saved and the next floor plan was presented. After a short period of training, the entire process could be completed in under a minute.
Overview of floor plan audit process
At the end of the review, outlines of all 20,419 floor plans were either approved or amended. Of those floor plans, 268 were flagged as problematic, with 559 geometrical elements (e.g., wall segment, door, space) were flagged as “unusual” and four floor plans were marked as “wrong”. In seven instances, there were geometrical elements missing from the geometry of the entire floor (e.g., apartments not included in the Archilyse dataset).
References
T. Kohonen, "The self-organizing map," in Proceedings of the IEEE, vol. 78, no. 9, pp. 1464-1480, Sept. 1990, doi: 10.1109/5.58325.
创建时间:
2024-12-06



