This task was created by the system
Created on Wednesday 21 April 2021, 13:48Back to task list
ProjectMetabolism of Cities Data Hub
TypeProcess a dataset
Assigned toCarolin Bellstedt
SubscribersAristide AthanassiadisCarolin BellstedtPaul Hoekman
Discussion and updates
Processing work was started
I finally finished formating the population data (oof that was brutal). I've tried to upload a new file when processing but every time I'm redirected to the "Metabolism of Cities is updating page". Here is the file attached.
Let me know if I'm doing something wrong.
Heu Paul, I'm just putting another message about this topic so it does not fall off from your radar :)
Thanks Aris, this slipped through the cracks, hadn't noticed it before! (we need a global notification system for relevant stuff, somehow). Anyway the problem is that with dataset is too large and it can not be processed in a single user session (which is up to 30 seconds or something). Just like we did with shapefiles, we need to process this by putting it in a queue and running a script server-side to do it afterwards. I made a task here. Please subscribe to the task and when I have completed this, try again. Thanks!
BTW 16k records... all countries in the world... epic stuff Aris, I can imagine prepping this was brutal --- thanks for persevering though and it will be exciting to get this in!!
Ok great Paul! Thanks for making the task, I just subsribed!
And you're right, prepping this was quite painful but we won't need to change it for a while! Btw, there were some inconsistancies with the countries used by UN and ours. But normally it's all well formatted.
Noted Aris. We will see when we import the dataset how it goes!
Task was no longer assigned to Aristide Athanassiadis and status was changed: In Progress → On Hold
Task was assigned to Carolin Bellstedt and status was changed: On Hold → In Progress
Status change: In Progress → Completed
I've been picking this one up again and processed the formatted file. Two points to note:
- The numbers are in thousands. It might be better not to have it in thousands, so that the proxy can be calculated right away. Easy enough to change that. Should we do it or is there a reason not to?
- It didn't give any problems, until the crunching, when it said: "We have tried processing this file, but have encountered an error. We could not find the space with the name: 'Gambia'" ...It is officially called The Gambia (see Wiki) and also as such existent as one of our reference spaces (in the "countries of the world" file). So I've changed it and crunched it once more.
Next hiccups, evident one by one after each crunch attempt:
Saint Barthélemy --> Saint-Barthélemy, with hyphen written in our ref space, but that's the French spelling, English is without, see Wiki.
Saint Martin --> Saint-Martin, French spelling (see Wiki)
Czechia --> Czech Republic
North Macedonia --> North Macedonia (Macedonia before February 2019), officially the Republic of North Macedonia; Macedonia is the ref space name in our system. Should we change the ref name?
--> Question in general, which standard naming should we follow?
Last edited: 2022-08-11 10:46:37.848595+00:00
Great you took this to the next level. My feedback:
- I would definitely NOT put numbers in thousands - I personally don't see a benefit and it definitely confuses using this dataset.
- I would recommend updating the Countries of the world dataset. We have version 4.10 in the system, and the current version is 5.11. I assume things like Macedonia are fixed in there. Also, there are like a ton of name fields in there and you might want to review if other fields are a better fit.
Thanks for your feedback.
1) Ok. So we agree on that. I will change it then.
2) How do I go about updating that? Do I add the new version as new entry? (It would duplicate all the reference spaces.) Do I replace the existing entry and reset the processing? The associated info would be lost.... - not a big deal for the population data because I have to process that again anyway, but there might be other data associated.
I've had a look at the new countries in the world version and Macedonia is indeed fixed. NAME_LONG seemed the most suitable name field, but it uses the country's official language for spelling, so we have issues with the same French examples again. Therefore, I would use the NAME_EN name field, where all the listed issues are taken care of.
I think it makes most sense to load the new shapefile. After all, names and boundaries of countries might well change and having differently versioned countries in our system is good practice. And NAME_EN sounds good then, glad to hear it will solve the issues!
Alright, thanks for the input. I've added version 5.1.1 now and choose NAME_EN.
So...hahaha, from all the status updates, you can see that this wasn't smooth sailing with the new countries file.
Here are all the changes that needed to be made:
São Tomé and Principe --> São Tomé and Príncipe
Republic of Cabo Verde --> Cape Verde
Côte d'Ivoire --> Ivory Coast
China --> People's Republic of China
Macao --> Macau
Dem. Rep. Korea --> North Korea
Republic of Korea --> South Korea
Brunei Darussalam --> Brunei
Lao PDR --> Laos
Timor-Leste --> East Timor
Bahamas --> The Bahamas
Wallis and Futuna Islands --> Wallis and Futuna
Russian Federation --> Russia
Faeroe Islands --> Faroe Islands
Vatican --> Vatican City
United States --> United States of America
- If I deduce correctly, the system, when crunching, brings back an error message with the first entry that it doesn't find. I wonder if it is possible for it to "scan" the rest of the names in one go, to check if it finds other unknowns, so this process doesn't have to be repeated 16 times (in this case)?
- In any case, if someone works on a city, it will be useful for them to see the higher scale spelling once the nested spaces have an interface (see task), so that they don't face this issue.
- Although the world population file is now processed for the years 1950-2020, the graphic has this forever loading wheel. Maybe the dataset is too large to be displayed?
Last edited: 2022-09-02 15:14:22.150879+00:00
Noted, good it is all working!
All these points sound doable to be addressed. I would recommend asking the new programmer to look into these things ;-)