Data automation

Converting printed catalogue images for use on an eCommerce store.

This case study highlights the benefits of data automation, considerably reducing the number of hours required to convert printed catalogue product images to a suitable format for use on an e-Commerce website.

The added challenge was renaming all images to use their product code so that the eCommerce database could correctly display them.

The automation workflow cut the project hours by about 75% as well as including various audit reports for data governance.

The printed catalogue was produced in Adobe InDesign; a heavyweight publishing and page layout software package.

InDesign has the ability to export project metadata into a simple, semi-structure text report, identifying many attributes like fonts and images. A typical report will contain thousands of lines of text, laid out like the examples below.


PACKAGE DATE: 28/07/2021 15:25

Creation Date: 23/04/2021

Modification Date: 20/07/2021


(Show Data for Hidden and Non-Printing Layers)

Fonts: 20 Fonts Used; 0 Missing, 10 Embedded, 0 Incomplete, 0 Protected

Links and Images: 6 Links Found; 0 Modified, 0 Missing, 0 Inaccessible

Images: 0 Embedded, 1 use RGB colour space

Colours and Inks: 4 Process Inks; 0 Spot Inks


External Plug-ins 2


Non Opaque Objects :On Page6


22 Fonts Used; 1 Missing, 12 Embedded, 0 Incomplete, 0 Protected

- Name: Grotesque-Bold; Type: Type 1, Status: Embedded

Filename: NA

Full Name: Grotesque-Bold

First Used on Page: First Used on Page: 3 (PDF)

Protected: No

- Name: Bembo-Regular; Type: OpenType Type 1, Status: OK

Filename: Macintosh HD:Library:Fonts:Bembo:Bembo Regular.otf

Full Name: Bembo Regular

First Used on Page: 5

Protected: No

- Name: Frutiger-Roman; Type: Type 1, Status: OK

Filename: Macintosh HD:Users:name.surname:Library:Adobe Systems:Frutiger-Roman:001.000:2018248800:FrutiRom

Full Name: 12 Frutiger* 55 Roman 05103

First Used on Page: 4

Protected: No

The typical image metadata looks something like this:

- Name: advert.pdf; Type: Adobe Portable Document Format (PDF); Status: Linked

Filename: advert.pdf

Link Updated: Tuesday, 14 September 2021 10:34

File Last Modified:

Actual ppi: Effective ppi:

Layer Overrides: No

Complete Name: /Volumes/GoogleDrive/Shared drives/Jobs/advert.pdf

Link Profile: None

On Page 6

- Name: shutterstock.jpg; Type: JPEG RGB; Status: Linked

Filename: shutterstock.jpg

Link Updated: Monday, 26 July 2021 14:11

File Last Modified:

Actual ppi: 300x300 Effective ppi: 2190x2190

Layer Overrides: N/A

Complete Name: /Volumes/GoogleDrive/Shared drives/Jobs/shutterstock.jpg

Link Profile: None

On Page 4

We were mainly interested in lifting out the colour-coded data below.

The renaming of the images is done via a Google Sheet, which acts as an audit process so that there is a record of existing printed images and their renamed online counterparts. This is very important when reconciling files between the printed catalogue and the new online eCommerce database.

Using regular expressions, key information is automatically lifted out of the InDesign package report as shown on the right. The report identifies:

  • File name

  • Page location

  • Colourspace

  • File resolution (with an alert where quality is too small)

  • A field for the 'new' filename

  • Option to ignore the image

  • What the image will be renamed to (regardless of any new value in Column F)

  • A preview of the image on Google Drive

Where images are used within the printed catalogue multiple times, the 'Rename?' field is automatically greyed out for the repeated images.

Ticking the 'Ignore' checkbox in column G turns the 'Rename?' field black, which is then ignored by the rename column.

Column H automatically determines a suitable 'web-safe' filename for the online image. It basically allows A-Z, a-z, 0-9 and - (dash) but replaces anything else with _ (underscore).

For example, Main image of part#123.jpg would automatically be renamed Main_image_of_part_123.jpg

Where a duplicate image is used to represent another product (ie. with a different product code), you can enter a value in one of the grey cells, then the value turns green to indicate that it will be saved as a new image based on the different product code.

If, however, any duplicate product codes are entered, the relevant fields turn red, allowing you to spot errors in advance.

Once the images have been reviewed, and either new names assigned or images ignored, a final list of changes is automatically produced.

In addition, a Unix shell script is automatically created which can copy the images from their original locations and rename them on the fly.

Additional workflows can then be used to convert all images from print colourspace (CMYK) to web colourspace (RGB) and then optimise them as web JPG files, at a specific size.

Based on a selection of 800 images, the 'copy and rename' process took about 11 seconds to run.

A list of low-quality resolution files was also automatically produced so that the artwork department could source better quality versions.

Make your data work for you. Transform your business with automation and valuable insights.