Community Portal

Aggregate data by time period for data visualisation

Created on Thursday 1 September 2022, 14:16

Back to task list
  • ID
    1011537
  • Project
    Metabolism of Cities Data Hub
  • Status
    Open
  • Priority
    Medium
  • Type
    Programming work
  • Tags
    General data hub improvements
  • Assigned to
    joe
  • Subscribers
    Carolin Bellstedt
    Paul Hoekman
    joe

You are not logged in

Log In Register

Please join us and let's build things, together!

Description

Objective: We usually need data for a certain year. However, often times there is data provided in other time periods, daily, weekly, monthly, quartlery etc.
It would be great to aggregate data by time period. The user should be able to do so in the data viz configuration (chart editor, example).

Audience: The user could be someone who needs a viz for a paper, some study or a blog post and would like to save / print it or embed it.

How it works: Let's use the data example of the Natural gas production in Norway, 2017-2021, monthly to see how this should work:

  • the user goes to the chart editor of the file (see screenshot)
  • the aggregation function could live with the "other setting" options, see red arrow
  • there should be an AGGREGRATE BY: field that can be toggled on. It could have a dropdown list of options that it can be aggregated to, but that is optional for later. For now, let's make BY YEAR work.
  • the system needs to check if: (1) data are in the same unit so that they can be aggregated, (if they aren't in the same unit, there should be a warning that those need to be converted first, unless the system can do it (?), see unit task). (2) Further it needs to be checked what interval the data is in, to add it correctly. ( I don't know if the system can check this, or the user would need to select the interval.) --> When monthly, all 12 months need to be added up, and if weekly, all 52 weeks need to be added up etc. In this example, instead of having data for every month, there will be 5 years shown (2017, 2018, 2019, 2020, 2021).
  • the new tailored graphic should be able to be saved separately, see task which allows that
  • when the main chart is saved with the aggregated data, there could be a small warning on the main page "This data is aggregated by year. To see detailed data, you can change this in the chart editor."

Discussion and updates


New task was created


I've updated the task description. There are a few uncertainties with regards to what "the system can and cannot do", but those can hopefully be clarified by the programmer taking up this task.


Great to see additional details. In response to this:

Further it needs to be checked what interval the data is in, to add it correctly. ( I don't know if the system can check this, or the user would need to select the interval.)

In the database we store the start and end date for each data point, so yes, the system can check this. However, you need to define what the system needs to do when there is a mismatch between selected interval and the date ranges (e.g. I select AGGREGATE BY YEAR and I have a data point that covers Dec 1-Jan 10... what happens then?).


Ok, great that the system can check and recognise this.

As for the mismatch, I see three solutions:
1. The system brings up an arrow message and says something like: "This data range includes an interval that is not conform with the one that is required. The data cannot be aggregated until this is resolved. Error: It covers "Dec 1-Jan 10"." Then there could be an explanation of how this could be resolved by breaking the amount of that range down per day (e.g. 120 tonnes for Dec 1-Jan 10 (=40 days) --> 3 tonnes daily. 3*31 days of Dec --> 93 tonnes in Dec).
2. The system makes the above calculation (if possible?), and provides it, but with a note in the meta data on data quality.
3. The system excludes the outlier (for a maximum of one month in the case of a year) and states in the data viz with an asterisk that "The data does not cover a full year, but only Jan-Nov, due to an incompatible data interval."

Which option makes the most sense or is most feasible?


Option 3 would be most feasible I think - the other ones require very complex queries that make development and maintaining very hard.


Alright, option 3 it is. The rules could be that it should be a minimum of 48 weeks to make up a year, 11 months to make up a year and all four quarters to make up one year. If it is less, the system shouldn't aggregate them, because it wouldn't be representative of a year.


OK sure, I leave it to you to define the rules, but you might also want to leave some of the decision-making to the user and not have complex restrictions in the code for things that could be best figured out by the one creating the visualization. In any case, your programmer will let you know what is easy and what is not.


Task was assigned to joe