Playing with JPEG quality and file size
This is probably no news at all for graphics/image processing experts, but its something I've just learnt myself and I thought it would be fun to share.
I am writing a static HTML photo algum generator and was a little suspicious of the size of the generated JPEG images. I thought "well, these JPEGs should not be that large ..."
I did some quick research and found out that ImageMagick uses JPEG quality 92 by default and was curious how file size would vary as I changed the output quality.
Then I took an image and produced thumbnails for it with the "JPEG quality" parameter ranging from 1 to 100 to check 1) how the file size varies with quality and 2) how much quality actually makes any difference when viewing the images.
To generate the thumbnails with varying quality, I did the following:
$ for i in $(seq -f %03g 1 100); do convert -scale 640x480 -quality $i /path/to/original.jpg $i.jpg; echo $i; done
Then I generated a data file by calculating the size of each file with du and piping the results through sed and awk:
$ du -b [0-9]*.jpg | sed 's/.jpg//' | awk '{ print $2 " " $1 }'
The generated data file looks this, with JPEG quality in first column and file size in bytes in the second column:
001 20380002 20383003 20634004 21106[...]
Regarding to file size, it seems like between 1 and 50, file size grows sublinearly with quality. Beyond that, the curve reaches an inflection point and grows in a way that looks, if not exponentially, at least polynomially.
The above plot was produced in a R session that looked like this:
$ R
R version 2.13.1 (2011-07-08)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i486-pc-linux-gnu (32-bit)
R é um software livre e vem sem GARANTIA ALGUMA.
Você pode redistribuí-lo sob certas circunstâncias.
Digite 'license()' ou 'licence()' para detalhes de distribuição.
R é um projeto colaborativo com muitos contribuidores.
Digite 'contributors()' para obter mais informações e
'citation()' para saber como citar o R ou pacotes do R em publicações.
Digite 'demo()' para demonstrações, 'help()' para o sistema on-line de ajuda,
ou 'help.start()' para abrir o sistema de ajuda em HTML no seu navegador.
Digite 'q()' para sair do R.
> png()
> data <- read.table('points.dat')
> quality <- data[[1]]
> quality
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
> filesize <- data[[2]]
> filesize
[1] 20380 20383 20634 21106 21551 22012 22469 22878 23323 23715
[11] 24103 24494 24952 25327 25725 26127 26507 26886 27216 27550
[21] 27917 28288 28627 28945 29271 29583 29919 30280 30516 30813
[31] 31099 31367 31679 31873 32232 32538 32704 33072 33324 33443
[41] 33860 34055 34253 34633 34804 35074 35216 35491 35871 35935
[51] 36030 36443 36743 36898 37120 37382 37726 38077 38307 38581
[61] 39002 39270 39700 39962 40388 40762 41086 41629 42062 42544
[71] 43048 43392 44062 44824 45023 45682 46532 47347 47833 48701
[81] 49612 50423 51694 52637 53635 55243 56340 58304 59709 62162
[91] 64207 66273 70073 74617 79917 86745 94950 105680 128158 145937
> plot(quality, filesize, xlab = 'JPEG Quality', ylab = 'File size')
>
Save workspace image? [y/n/c]: y
Looking at the actual generated thumbnails, somewhere after quality > 60 I stopped noticing the difference between increasing quality factors. Settling with a default quality of 75 seems to be good enough: the resulting static HTML album generated from a folder with 82 pictures dropped from 12MB with the default ImageMagick quality factor to 6MB with quality 75, with very little perceivable image quality loss.