Community Portal

Process dataset

Created on Thursday 27 May 2021, 13:35

Back to task list
  • ID
    989159
  • Project
    Metabolism of Cities Data Hub
  • Status
    In Progress
  • Priority
    Medium
  • Type
    Process a dataset
  • Related to
  • Assigned to
    Carolin Bellstedt
  • Subscribers
    Aristide Athanassiadis
    Carolin Bellstedt
    Guus Hoekman
    Paul Hoekman

You are not logged in

Log In Register

Please join us and let's build things, together!

Discussion and updates


This task was created by the system


Processing work was started


For the Sankey test, Guus had asked us to provide the data in a certain format, while Paul wanted it uploaded and processed in our standard way, separated by materials. I've made 3 different files (fruits, vegetables, cereals) and uploaded fruits for now to test it. Although we are using Munich as our test city, I've used Apeldoorn as ref space in the file, because we don't actually have Munich yet.
Different issues and questions arose:

  1. Usually for other datasets, we can have several different materials in one. I'd like to understand why this is different for the Sankey? Can we still have several different materials show up in a "sector Sankey"? I wonder how we can "merge" the info of several flows back into one to have the sector Sankey - one showing several materials in one Sankey?
  2. With "uploading them the normal way" and processing them, I suppose Paul meant to do it just like flow data is usually uploaded? As I'm trying to do that there is a conflict with the format, because the one that we usually use is different from the "Sankey format", loosing the info of FROM and TO (e.g. EXTRACTION to MANUFACTURING) and the color column. For now, I will just leave the additional columns in a separate file, add it to the same data item entry and process the standard format.

I tried to follow through on point 2, but even without the additional columns and with a reference space that we know we have in the Data Hub (Canton Geneva), I get this error message: "Not all of your data could be properly processed. Please review the error below and upload a new file. ERROR: Could not load the spreadsheet data." I don't know what I'm doing wrong here...Paul?


Hi Carolin,

There was a miscommunication I think. It does not need to be split up by material. It needs to be split up by type of flow. The same way that different datasets would normally be uploaded by regular users. For instance, one dataset for extraction, one dataset for manufacturing, another for manufacturing waste, etc.

The point is basically that we need to be able to derive TO and FROM based on the category that it is uploaded into because indeed as you say, that is information that is not present in the file uploaded. The layer to which the file is uploaded should indicate which part of the sankey this belongs to.

It will require uploading many separate files, but that is going to be necessary until we hop onto the Priority project and we make MFA data loading more properly embedded in the system.


Aaaaaah, alright. Now we are on the same page. I will upload and process those files then.
For example, I will add the files (one for biomass, one for construction) into the extraction layer to represent that the FROM is the extraction one, and I can use the comment column to indicate the TO (e.g. add manufacturing, export etc.). Is that ok? We will loose the color column for now then. Or do you have any other suggestions?
Maybe I will add in a google sheet what I envision for the way they are split up.


Great! I would say let's use the SEGMENT column to indicate how the flow is split up --- but all of this is a bit of a try-out given that it is the first time we do it.
Color info should indeed not go in the spreadsheet as this is more of a post-processing visualization setting.


Ahhh one more thing: the sankey will be based on a system diagram (not sure if you already drew this?). So make sure that you use the same phrasing that was used there - that will help map it. So in the SEGMENT column put the TO process with the exact same spelling as what was defined in the system diagram. If SEGMENT doesn't seem to make sense I am also happy for you to use the COMMENTS column.


Ok, there are no issues with using the SEGMENT column, so I've used that. But I'm running into a different issue, though not sure it really is one, so it would be useful to hear your thoughts: the layer names where it is uploaded don't correspond to the names of the nodes (although I assume they should, right?), that is not given for the MoC Data Hub Layers in any case. I could upload them in the CL Data Hub, but there we have simplified (=reduced the amount of) the layers in Layer 3. Layer 2 has them all that we are using as nodes, except for RETAIL. Retail does not exist as its own layer. Imports/exports/wholesale/retail is all in one, due to the fact that NACE doesn't differentiate between the sale or retail being national or international....
What is the best thing to do here? Create an additional RETAIL layer?

We hadn't done the system diagram yet in the FD builder, but I was going to get started on it now. Here is the system diagram. However, it only allows for NACE and Rupertismo List activities, not the rather generic names that we've been using. I would only be able to use those as the Origin and destination labels of the block, but they don't exist in the drop-down of course. How should I go about this? Use something else and you use the label later on? Doesn't seem like a good long-term solution, but do let me know.

FYI, I will wait with the preparing and uploading of the files until we have the name situation figured out then.


Noted Carolin.

Yeah with the layers not having exactly the same name it will require some work to match them up properly. This is also something I ultimately want to address in our relayering system. I can likely indicate to Guus how he can do this for CL in the meantime. I would not try and make them match fully right now and I would just leave your layers as they are, and once the system diagram is in place I can do a review and try to come up with a programmatic plan to link the things.

However the incompatibility with NACE is indeed an issue. We should aim for the dropdown being able to offer the right option and not build a whole loophole around this. We can pick another catalog that is not NACE, or build our own if we really have to, but I would like to explore this in some more detail before we change things that dramatically.

Can you elaborate a bit more on which generic names are not present in the NACE codes? In terms of retail, I would personally say that "import" and "exports" should not be seen as retail but instead as transportation flows. Section 49 of NACE seems to be quite suitable for this. Whether it is an import or an export can be determined by the system based on origin and destination of the flow. What do you think? And I also seem to see retail and wholesale differentiation in NACE (46 vs 47) -- or not?


Just for record keeping: as discussed by phone we will freeze this discussion and you will work with Guus on a hard-coded sankey to make life easy and meet your deadlines. We will pick this up in due course and can discuss a sankey generator etc in our upcoming sprint.