Playing with JPEG quality and file size

This is probably no news at all for graphics/image processing experts, but its something I've just learnt myself  and I thought it would be fun to share.

I am writing a static HTML photo algum generator and was a little suspicious of the size of the generated JPEG images. I thought "well, these JPEGs should not be that large ..."

I did some quick research and found out that ImageMagick uses JPEG quality 92 by default and was curious how file size would vary as I changed the output quality.

Then I took an image and produced thumbnails for it with the "JPEG quality" parameter ranging from 1 to 100 to check 1) how the file size varies with quality and 2) how much quality actually makes any difference when viewing the images.

To generate the thumbnails with varying quality, I did the following:

$ for i in $(seq -f %03g 1 100); do convert -scale 640x480 -quality $i /path/to/original.jpg $i.jpg; echo $i; done

Then I generated a data file by calculating the size of each file with du and piping the results through sed and awk:

$ du -b [0-9]*.jpg | sed 's/.jpg//' | awk '{ print $2 " " $1 }'

The generated data file looks this, with JPEG quality in first column and file size in bytes in the second column:

001 20380002 20383003 20634004 21106[...]

Regarding to file size, it seems like between 1 and 50, file size grows sublinearly with quality. Beyond that, the curve reaches an inflection point and grows in a way that looks, if not exponentially, at least polynomially.

The above plot was produced in a R session that looked like this:

$ R

R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i486-pc-linux-gnu (32-bit)

R é um software livre e vem sem GARANTIA ALGUMA.
Você pode redistribuí-lo sob certas circunstâncias.
Digite 'license()' ou 'licence()' para detalhes de distribuição.

R é um projeto colaborativo com muitos contribuidores.
Digite 'contributors()' para obter mais informações e
'citation()' para saber como citar o R ou pacotes do R em publicações.

Digite 'demo()' para demonstrações, 'help()' para o sistema on-line de ajuda,
ou 'help.start()' para abrir o sistema de ajuda em HTML no seu navegador.
Digite 'q()' para sair do R.

> png()
> data <- read.table('points.dat')
> quality <- data[[1]]
> quality
  [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
 [19]  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36
 [37]  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54
 [55]  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72
 [73]  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90
 [91]  91  92  93  94  95  96  97  98  99 100
> filesize <- data[[2]]
> filesize
  [1]  20380  20383  20634  21106  21551  22012  22469  22878  23323  23715
 [11]  24103  24494  24952  25327  25725  26127  26507  26886  27216  27550
 [21]  27917  28288  28627  28945  29271  29583  29919  30280  30516  30813
 [31]  31099  31367  31679  31873  32232  32538  32704  33072  33324  33443
 [41]  33860  34055  34253  34633  34804  35074  35216  35491  35871  35935
 [51]  36030  36443  36743  36898  37120  37382  37726  38077  38307  38581
 [61]  39002  39270  39700  39962  40388  40762  41086  41629  42062  42544
 [71]  43048  43392  44062  44824  45023  45682  46532  47347  47833  48701
 [81]  49612  50423  51694  52637  53635  55243  56340  58304  59709  62162
 [91]  64207  66273  70073  74617  79917  86745  94950 105680 128158 145937
> plot(quality, filesize, xlab = 'JPEG Quality', ylab = 'File size')
> 
Save workspace image? [y/n/c]: y

Looking at the actual generated thumbnails, somewhere after quality > 60 I stopped noticing the difference between increasing quality factors. Settling with a default quality of 75 seems to be good enough: the resulting static HTML album generated from a folder with 82 pictures dropped from 12MB with the default ImageMagick quality factor to 6MB with quality 75, with very little perceivable image quality loss.