Data automation

Converting printed catalogue images for use on an eCommerce store.

The client used 4 people over 3 days to try and manually find and rename 1,500 catalogue images so they could be given to their web agency who needed to produce an online eCommerce store. The results were inconsistent and the work was rejected.

We got involved and developed an automated process in just under 2 days, that was able to consistently find, rename, and process the catalogue images.

The starting point was to automatically scrape metadata from the printed catalogue software (Adobe InDesign) which was then used to analyse all 1,500 images, their page location, file name, colour space and file format. The metadata existed over thousands of lines of code, but we were able to extract the key information we needed. 

Below is an example of the metadata. 

PUBLICATION NAME: Filename.indd

PACKAGE DATE: 28/07/2021 15:25

Creation Date: 23/04/2021

Modification Date: 20/07/2021

SUMMARY

(Show Data for Hidden and Non-Printing Layers)

Fonts: 20 Fonts Used; 0 Missing, 10 Embedded, 0 Incomplete, 0 Protected

Links and Images: 6 Links Found; 0 Modified, 0 Missing, 0 Inaccessible

Images: 0 Embedded, 1 use RGB colour space

Colours and Inks: 4 Process Inks; 0 Spot Inks

CMS is ON

External Plug-ins 2

FontManagementIDCC2018.InDesignPlugin

Non Opaque Objects :On Page6

FONTS

22 Fonts Used; 1 Missing, 12 Embedded, 0 Incomplete, 0 Protected

- Name: Grotesque-Bold; Type: Type 1, Status: Embedded

Filename: NA

Full Name: Grotesque-Bold

First Used on Page: First Used on Page: 3 (PDF) logo.ai

Protected: No

- Name: Bembo-Regular; Type: OpenType Type 1, Status: OK

Filename: Macintosh HD:Library:Fonts:Bembo:Bembo Regular.otf

Full Name: Bembo Regular

First Used on Page: 5

Protected: No

- Name: Frutiger-Roman; Type: Type 1, Status: OK

Filename: Macintosh HD:Users:name.surname:Library:Adobe Systems:Frutiger-Roman:001.000:2018248800:FrutiRom

Full Name: 12 Frutiger* 55 Roman 05103

First Used on Page: 4

Protected: No

Data relating to the images is typically in the example below, so we needed to write code that could identify the specific lines we needed.

- Name: advert.pdf; Type: Adobe Portable Document Format (PDF); Status: Linked

Filename: advert.pdf

Link Updated: Tuesday, 14 September 2021 10:34

File Last Modified:

Actual ppi: Effective ppi:

Layer Overrides: No

Complete Name: /Volumes/GoogleDrive/Shared drives/Jobs/advert.pdf

Link Profile: None

On Page 6

- Name: shutterstock.jpg; Type: JPEG RGB; Status: Linked

Filename: shutterstock.jpg

Link Updated: Monday, 26 July 2021 14:11

File Last Modified:

Actual ppi: 300x300 Effective ppi: 2190x2190

Layer Overrides: N/A

Complete Name: /Volumes/GoogleDrive/Shared drives/Jobs/shutterstock.jpg

Link Profile: None

On Page 4

We were mainly interested in taking the lines, colour-coded below.

We used a cloud-based spreadsheet (Google Sheets) to keep a record of the required images, their current file name, and the 'web-friendly' new file name based on the product code. This was critical in being able to reconcile the images and capture their transformation, before and after.

The coding used a technique based on regular expressions, where key information is automatically copied out of the InDesign metadata file. The final report then identifies:

Where images are used within the printed catalogue multiple times (like on headers and footers), the 'Rename?' column is automatically greyed out to show that the image has already been dealt with.

There was a workflow to ignore specific images, like ones that are not related to products. This was easily achieved by ticking the 'Ignore' checkbox in column G.

Column H automatically determines a suitable 'web-safe' filename for the online image. 

It basically takes the product code in column F and automatically works out which characters are 'web-friendly'. Any problematic characters are automatically replaced with and underscore.

For example, Main image of part#123.jpg would automatically be renamed Main_image_of_part_123.jpg

Some of the workflows were quite complex, particularly where two products in the catalogue had been represented by the same image (visually they products look identical). 

In this scenario, the workflow would allow you to create a separate and dedicated image for the second product - which would automatically highlight in green so you were aware of what the renamer would do.

The solution would also highlight any duplication issues, if you inadvertently used the same file name again.

A final list of changes was automatically produced, showing old name, new name, page and file location.

The solution then used, a Unix shell script to automatically copy the images from their original locations and rename them on the fly, putting them in a new folder. 

Based on a selection of 1,500 images, the 'copy and rename' process took less than 20 seconds to run.

A list of low-quality resolution files was also automatically produced so that the artwork department could see which images were too small to use, allowing them to source new images.

Make your data work for you. Transform your business with automation and valuable insights.