Speed up chart generation

Description

If a dataset is large, then the creation of a chart takes a lot of time. At times it even times out due to the size. And we aren't even talking about millions of records! So this needs to be sped up.

Discussion and updates

Paul Hoekman Johannesburg 2140 points

1 year, 10 months ago

New task was created

Paul Hoekman Johannesburg 2140 points

1 year, 10 months ago

Task was assigned to Paul Hoekman

Paul Hoekman Johannesburg 2140 points

1 year, 10 months ago

Status change: Open → In Progress

Paul Hoekman Johannesburg 2140 points

1 year, 10 months ago

A new caching system has been set up. Just to keep the technical functioning documented:

The Django caching system has been enabled
We specifically cache the json objects that are needed as data input for our charts. These objects are time-consuming to create because we loop over all the data to format it in the right way. However, for every dataset there is no need to keep doing this over and over, so it's ideal to cache.
We cache indefinitely. However, we keep track of what is cached because when a dataset is updated we need to remove it and re-generate the cache.
In the meta_data of the library item we store which cache objects have been created so far
In principle, the cache is stored the first time a chart is displayed (the system sees it does not exist and then stores it in cache)
However, for some charts the loading time > 30 sec and a server time-out happens, so nothing is ever stored. For those datasets there is an admin-button in the left-hand menu (the total number of buttons got too wild so I hid them away under an "admin options" button to keep it all a bit cleaner). When clicking this button, the system will schedule the cache creation server-side, where there is no problem with the time-out. This is done through a cronjobs.
There are certain "sliced" data visualizations (e.g. when looking at a spatial subset of the data). These are separate json objects. In order to distinguish between these, there is a cache-key generated that includes all GET parameters (because we pass various conditioning parameters through the URL). Each cache object is saved with a unique cache key that varies even if it's for the same dataset, so that we can have multiple cache objects for individual datasets.

Let's see how this performs over the coming weeks and months, but I can at least report that 20 second generation time is down to less than 1 second for those that I have checked, so looking much better!

Screenshot_2022-09-30_09-17-21.png (15.3 KB)

Paul Hoekman Johannesburg 2140 points

1 year, 10 months ago

Status change: In Progress → Completed

Carolin Bellstedt Metro Vancouver Regional District 1280 points

1 year, 9 months ago

The technical stuff goes over my head with this one, but I what caught my attention is that you changed the buttons...this is because I'm missing the “+ Add image” button that Guus had put there. It was very handy to add visualisations of graphs (screenshots) and we are using it a lot. Did you remove it during this task by any chance and could you add it back?
(The alternative of adding an image with link in the table, here for example, does not make for a good workflow.)

Paul Hoekman Johannesburg 2140 points

1 year, 8 months ago

Hey Carolin,
I didn't get or see a notification of your reply - sorry for the delay. But I removed that + button indeed for datasets, because I wasn't aware it was still being used in CL. As a general concept it isn't ideal if people upload images to datasets and instead we should have our system dynamically generate the visualizations. But I will put an exception in so that in the CL project people will still see the button.

Community Portal

You are not logged in

Description

Discussion and updates