Data automation
Converting printed catalogue images for use on an eCommerce store.
This case study highlights the benefits of data automation, considerably reducing the number of hours required to convert printed catalogue product images to a suitable format for use on an e-Commerce website.
The added challenge was renaming all images to use their product code so that the eCommerce database could correctly display them.
The automation workflow cut the project hours by about 75% as well as including various audit reports for data governance.
The printed catalogue was produced in Adobe InDesign; a heavyweight publishing and page layout software package.
InDesign has the ability to export project metadata into a simple, semi-structure text report, identifying many attributes like fonts and images. A typical report will contain thousands of lines of text, laid out like the examples below.
PUBLICATION NAME: Filename.indd
PACKAGE DATE: 28/07/2021 15:25
Creation Date: 23/04/2021
Modification Date: 20/07/2021
SUMMARY
(Show Data for Hidden and Non-Printing Layers)
Fonts: 20 Fonts Used; 0 Missing, 10 Embedded, 0 Incomplete, 0 Protected
Links and Images: 6 Links Found; 0 Modified, 0 Missing, 0 Inaccessible
Images: 0 Embedded, 1 use RGB colour space
Colours and Inks: 4 Process Inks; 0 Spot Inks
CMS is ON
External Plug-ins 2
FontManagementIDCC2018.InDesignPlugin
Non Opaque Objects :On Page6
FONTS
22 Fonts Used; 1 Missing, 12 Embedded, 0 Incomplete, 0 Protected
- Name: Grotesque-Bold; Type: Type 1, Status: Embedded
Filename: NA
Full Name: Grotesque-Bold
First Used on Page: First Used on Page: 3 (PDF) logo.ai
Protected: No
- Name: Bembo-Regular; Type: OpenType Type 1, Status: OK
Filename: Macintosh HD:Library:Fonts:Bembo:Bembo Regular.otf
Full Name: Bembo Regular
First Used on Page: 5
Protected: No
- Name: Frutiger-Roman; Type: Type 1, Status: OK
Filename: Macintosh HD:Users:name.surname:Library:Adobe Systems:Frutiger-Roman:001.000:2018248800:FrutiRom
Full Name: 12 Frutiger* 55 Roman 05103
First Used on Page: 4
Protected: No
The typical image metadata looks something like this:
- Name: advert.pdf; Type: Adobe Portable Document Format (PDF); Status: Linked
Filename: advert.pdf
Link Updated: Tuesday, 14 September 2021 10:34
File Last Modified:
Actual ppi: Effective ppi:
Layer Overrides: No
Complete Name: /Volumes/GoogleDrive/Shared drives/Jobs/advert.pdf
Link Profile: None
On Page 6
- Name: shutterstock.jpg; Type: JPEG RGB; Status: Linked
Filename: shutterstock.jpg
Link Updated: Monday, 26 July 2021 14:11
File Last Modified:
Actual ppi: 300x300 Effective ppi: 2190x2190
Layer Overrides: N/A
Complete Name: /Volumes/GoogleDrive/Shared drives/Jobs/shutterstock.jpg
Link Profile: None
On Page 4
We were mainly interested in lifting out the colour-coded data below.
The renaming of the images is done via a Google Sheet, which acts as an audit process so that there is a record of existing printed images and their renamed online counterparts. This is very important when reconciling files between the printed catalogue and the new online eCommerce database.
Using regular expressions, key information is automatically lifted out of the InDesign package report as shown on the right. The report identifies:
File name
Page location
Colourspace
File resolution (with an alert where quality is too small)
A field for the 'new' filename
Option to ignore the image
What the image will be renamed to (regardless of any new value in Column F)
A preview of the image on Google Drive
Where images are used within the printed catalogue multiple times, the 'Rename?' field is automatically greyed out for the repeated images.
Ticking the 'Ignore' checkbox in column G turns the 'Rename?' field black, which is then ignored by the rename column.
Column H automatically determines a suitable 'web-safe' filename for the online image. It basically allows A-Z, a-z, 0-9 and - (dash) but replaces anything else with _ (underscore).
For example, Main image of part#123.jpg would automatically be renamed Main_image_of_part_123.jpg
Where a duplicate image is used to represent another product (ie. with a different product code), you can enter a value in one of the grey cells, then the value turns green to indicate that it will be saved as a new image based on the different product code.
If, however, any duplicate product codes are entered, the relevant fields turn red, allowing you to spot errors in advance.
Once the images have been reviewed, and either new names assigned or images ignored, a final list of changes is automatically produced.
In addition, a Unix shell script is automatically created which can copy the images from their original locations and rename them on the fly.
Additional workflows can then be used to convert all images from print colourspace (CMYK) to web colourspace (RGB) and then optimise them as web JPG files, at a specific size.
Based on a selection of 800 images, the 'copy and rename' process took about 11 seconds to run.
A list of low-quality resolution files was also automatically produced so that the artwork department could source better quality versions.
Make your data work for you. Transform your business with automation and valuable insights.