Safely switching off legacy software is an easy way to reduce running costs. Read on for how we helped the Coastguard remove their dependency on a legacy Content Management System without losing any of the content.
The Coastguard were reliant on a legacy Content Management System (CMS), which incurred significant year on year licensing costs. The coastguard needed to retire the CMS. And consequently, the coastguard were looking for a safe way to decommission it whilst not losing access to any of the historical content data.
The challenge was to save and backup the old CMS content, so that it could be accessed without the coastguard having to pay any further licensing costs. The extracted content also needed to be searchable. Furthermore, this searchable backup would form part of the content migration strategy from legacy to new.
We needed to extract HTML, PDF’s, Word Docs, Powerpoints, Videos and Embedded Youtube videos.
Sometimes simple solutions are the best. For example, in this case, it called for brute forcing. Therefore, we developed a bespoke web crawler which worked within the confines of the CMS extracting HTML, links, images and Docs. We were then able to identify every unique bit of content and offload.
A lot of project time goes into problem solving and this was no exception. For example, links needed to be dynamically re-written to work within modern browser security constraints as our solution required of offloaded content to be displayed within an iframe.
The requirement to be able to search PDF and Word document content was tricky. Therefore, finding good open source projects to help with the text extraction was key:
From the raw extracted data, we built a web frontend, which allowed users to navigate and search the extracted content. Finally, we hosted it on the customers AWS instance.
We secured and indexed the content from the CMS. And so, the coastguard could safely switch off the legacy software and recoup the running costs.