Data automation
Converting printed catalogue images for use on an eCommerce store.
The client used 4 people over 3 days to try and manually find and rename 1,500 catalogue images so they could be given to their web agency who needed to produce an online eCommerce store. The results were inconsistent and the work was rejected.
We got involved and developed an automated process in just under 2 days, that was able to consistently find, rename, and process the catalogue images.
The starting point was to automatically scrape metadata from the printed catalogue software (Adobe InDesign) which was then used to analyse all 1,500 images, their page location, file name, colour space and file format. The metadata existed over thousands of lines of code, but we were able to extract the key information we needed.
Below is an example of the metadata.
PUBLICATION NAME: Filename.indd
PACKAGE DATE: 28/07/2021 15:25
Creation Date: 23/04/2021
Modification Date: 20/07/2021
SUMMARY
(Show Data for Hidden and Non-Printing Layers)
Fonts: 20 Fonts Used; 0 Missing, 10 Embedded, 0 Incomplete, 0 Protected
Links and Images: 6 Links Found; 0 Modified, 0 Missing, 0 Inaccessible
Images: 0 Embedded, 1 use RGB colour space
Colours and Inks: 4 Process Inks; 0 Spot Inks
CMS is ON
External Plug-ins 2
FontManagementIDCC2018.InDesignPlugin
Non Opaque Objects :On Page6
FONTS
22 Fonts Used; 1 Missing, 12 Embedded, 0 Incomplete, 0 Protected
- Name: Grotesque-Bold; Type: Type 1, Status: Embedded
Filename: NA
Full Name: Grotesque-Bold
First Used on Page: First Used on Page: 3 (PDF) logo.ai
Protected: No
- Name: Bembo-Regular; Type: OpenType Type 1, Status: OK
Filename: Macintosh HD:Library:Fonts:Bembo:Bembo Regular.otf
Full Name: Bembo Regular
First Used on Page: 5
Protected: No
- Name: Frutiger-Roman; Type: Type 1, Status: OK
Filename: Macintosh HD:Users:name.surname:Library:Adobe Systems:Frutiger-Roman:001.000:2018248800:FrutiRom
Full Name: 12 Frutiger* 55 Roman 05103
First Used on Page: 4
Protected: No
Data relating to the images is typically in the example below, so we needed to write code that could identify the specific lines we needed.
- Name: advert.pdf; Type: Adobe Portable Document Format (PDF); Status: Linked
Filename: advert.pdf
Link Updated: Tuesday, 14 September 2021 10:34
File Last Modified:
Actual ppi: Effective ppi:
Layer Overrides: No
Complete Name: /Volumes/GoogleDrive/Shared drives/Jobs/advert.pdf
Link Profile: None
On Page 6
- Name: shutterstock.jpg; Type: JPEG RGB; Status: Linked
Filename: shutterstock.jpg
Link Updated: Monday, 26 July 2021 14:11
File Last Modified:
Actual ppi: 300x300 Effective ppi: 2190x2190
Layer Overrides: N/A
Complete Name: /Volumes/GoogleDrive/Shared drives/Jobs/shutterstock.jpg
Link Profile: None
On Page 4
We were mainly interested in taking the lines, colour-coded below.
We used a cloud-based spreadsheet (Google Sheets) to keep a record of the required images, their current file name, and the 'web-friendly' new file name based on the product code. This was critical in being able to reconcile the images and capture their transformation, before and after.
The coding used a technique based on regular expressions, where key information is automatically copied out of the InDesign metadata file. The final report then identifies:
File name
Page location
Colourspace
File resolution (with an alert where quality is too small)
A field for the 'new' filename
Option to ignore the image
What the image will be renamed to (regardless of any new value in Column F)
A preview of the image on Google Drive
Where images are used within the printed catalogue multiple times (like on headers and footers), the 'Rename?' column is automatically greyed out to show that the image has already been dealt with.
There was a workflow to ignore specific images, like ones that are not related to products. This was easily achieved by ticking the 'Ignore' checkbox in column G.
Column H automatically determines a suitable 'web-safe' filename for the online image.
It basically takes the product code in column F and automatically works out which characters are 'web-friendly'. Any problematic characters are automatically replaced with and underscore.
For example, Main image of part#123.jpg would automatically be renamed Main_image_of_part_123.jpg
Some of the workflows were quite complex, particularly where two products in the catalogue had been represented by the same image (visually they products look identical).
In this scenario, the workflow would allow you to create a separate and dedicated image for the second product - which would automatically highlight in green so you were aware of what the renamer would do.
The solution would also highlight any duplication issues, if you inadvertently used the same file name again.
A final list of changes was automatically produced, showing old name, new name, page and file location.
The solution then used, a Unix shell script to automatically copy the images from their original locations and rename them on the fly, putting them in a new folder.
Based on a selection of 1,500 images, the 'copy and rename' process took less than 20 seconds to run.
A list of low-quality resolution files was also automatically produced so that the artwork department could see which images were too small to use, allowing them to source new images.
Make your data work for you. Transform your business with automation and valuable insights.