Community Portal

Speed up chart generation

Created on Tuesday 27 September 2022, 14:42

Back to task list
  • ID
    1014012
  • Project
    Metabolism of Cities Data Hub
  • Status
    Completed
  • Priority
    Medium
  • Type
    Programming work
  • Tags
    General data hub improvements
  • Assigned to
    Paul Hoekman
  • Subscribers
    Paul Hoekman

You are not logged in

Log In Register

Please join us and let's build things, together!

Description

If a dataset is large, then the creation of a chart takes a lot of time. At times it even times out due to the size. And we aren't even talking about millions of records! So this needs to be sped up.

Discussion and updates


New task was created


Task was assigned to Paul Hoekman


Status change: Open → In Progress


A new caching system has been set up. Just to keep the technical functioning documented:

  • The Django caching system has been enabled
  • We specifically cache the json objects that are needed as data input for our charts. These objects are time-consuming to create because we loop over all the data to format it in the right way. However, for every dataset there is no need to keep doing this over and over, so it's ideal to cache.
  • We cache indefinitely. However, we keep track of what is cached because when a dataset is updated we need to remove it and re-generate the cache.
  • In the meta_data of the library item we store which cache objects have been created so far
  • In principle, the cache is stored the first time a chart is displayed (the system sees it does not exist and then stores it in cache)
  • However, for some charts the loading time > 30 sec and a server time-out happens, so nothing is ever stored. For those datasets there is an admin-button in the left-hand menu (the total number of buttons got too wild so I hid them away under an "admin options" button to keep it all a bit cleaner). When clicking this button, the system will schedule the cache creation server-side, where there is no problem with the time-out. This is done through a cronjobs.
  • There are certain "sliced" data visualizations (e.g. when looking at a spatial subset of the data). These are separate json objects. In order to distinguish between these, there is a cache-key generated that includes all GET parameters (because we pass various conditioning parameters through the URL). Each cache object is saved with a unique cache key that varies even if it's for the same dataset, so that we can have multiple cache objects for individual datasets.

Let's see how this performs over the coming weeks and months, but I can at least report that 20 second generation time is down to less than 1 second for those that I have checked, so looking much better!


Status change: In Progress → Completed


The technical stuff goes over my head with this one, but I what caught my attention is that you changed the buttons...this is because I'm missing the “+ Add image” button that Guus had put there. It was very handy to add visualisations of graphs (screenshots) and we are using it a lot. Did you remove it during this task by any chance and could you add it back?
(The alternative of adding an image with link in the table, here for example, does not make for a good workflow.)


Hey Carolin,
I didn't get or see a notification of your reply - sorry for the delay. But I removed that + button indeed for datasets, because I wasn't aware it was still being used in CL. As a general concept it isn't ideal if people upload images to datasets and instead we should have our system dynamically generate the visualizations. But I will put an exception in so that in the CL project people will still see the button.