comments updated
[henge/apc.git] / stb / stb_image.h
1 /* stb_image - v2.14 - public domain image loader - http://nothings.org/stb_image.h
2 no warranty implied; use at your own risk
3
4 Do this:
5 #define STB_IMAGE_IMPLEMENTATION
6 before you include this file in *one* C or C++ file to create the implementation.
7
8 // i.e. it should look like this:
9 #include ...
10 #include ...
11 #include ...
12 #define STB_IMAGE_IMPLEMENTATION
13 #include "stb_image.h"
14
15 You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
16 And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
17
18
19 QUICK NOTES:
20 Primarily of interest to game developers and other people who can
21 avoid problematic images and only need the trivial interface
22
23 JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
24 PNG 1/2/4/8-bit-per-channel (16 bpc not supported)
25
26 TGA (not sure what subset, if a subset)
27 BMP non-1bpp, non-RLE
28 PSD (composited view only, no extra channels, 8/16 bit-per-channel)
29
30 GIF (*comp always reports as 4-channel)
31 HDR (radiance rgbE format)
32 PIC (Softimage PIC)
33 PNM (PPM and PGM binary only)
34
35 Animated GIF still needs a proper API, but here's one way to do it:
36 http://gist.github.com/urraka/685d9a6340b26b830d49
37
38 - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
39 - decode from arbitrary I/O callbacks
40 - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
41
42 Full documentation under "DOCUMENTATION" below.
43
44
45 Revision 2.00 release notes:
46
47 - Progressive JPEG is now supported.
48
49 - PPM and PGM binary formats are now supported, thanks to Ken Miller.
50
51 - x86 platforms now make use of SSE2 SIMD instructions for
52 JPEG decoding, and ARM platforms can use NEON SIMD if requested.
53 This work was done by Fabian "ryg" Giesen. SSE2 is used by
54 default, but NEON must be enabled explicitly; see docs.
55
56 With other JPEG optimizations included in this version, we see
57 2x speedup on a JPEG on an x86 machine, and a 1.5x speedup
58 on a JPEG on an ARM machine, relative to previous versions of this
59 library. The same results will not obtain for all JPGs and for all
60 x86/ARM machines. (Note that progressive JPEGs are significantly
61 slower to decode than regular JPEGs.) This doesn't mean that this
62 is the fastest JPEG decoder in the land; rather, it brings it
63 closer to parity with standard libraries. If you want the fastest
64 decode, look elsewhere. (See "Philosophy" section of docs below.)
65
66 See final bullet items below for more info on SIMD.
67
68 - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing
69 the memory allocator. Unlike other STBI libraries, these macros don't
70 support a context parameter, so if you need to pass a context in to
71 the allocator, you'll have to store it in a global or a thread-local
72 variable.
73
74 - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and
75 STBI_NO_LINEAR.
76 STBI_NO_HDR: suppress implementation of .hdr reader format
77 STBI_NO_LINEAR: suppress high-dynamic-range light-linear float API
78
79 - You can suppress implementation of any of the decoders to reduce
80 your code footprint by #defining one or more of the following
81 symbols before creating the implementation.
82
83 STBI_NO_JPEG
84 STBI_NO_PNG
85 STBI_NO_BMP
86 STBI_NO_PSD
87 STBI_NO_TGA
88 STBI_NO_GIF
89 STBI_NO_HDR
90 STBI_NO_PIC
91 STBI_NO_PNM (.ppm and .pgm)
92
93 - You can request *only* certain decoders and suppress all other ones
94 (this will be more forward-compatible, as addition of new decoders
95 doesn't require you to disable them explicitly):
96
97 STBI_ONLY_JPEG
98 STBI_ONLY_PNG
99 STBI_ONLY_BMP
100 STBI_ONLY_PSD
101 STBI_ONLY_TGA
102 STBI_ONLY_GIF
103 STBI_ONLY_HDR
104 STBI_ONLY_PIC
105 STBI_ONLY_PNM (.ppm and .pgm)
106
107 Note that you can define multiples of these, and you will get all
108 of them ("only x" and "only y" is interpreted to mean "only x&y").
109
110 - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
111 want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
112
113 - Compilation of all SIMD code can be suppressed with
114 #define STBI_NO_SIMD
115 It should not be necessary to disable SIMD unless you have issues
116 compiling (e.g. using an x86 compiler which doesn't support SSE
117 intrinsics or that doesn't support the method used to detect
118 SSE2 support at run-time), and even those can be reported as
119 bugs so I can refine the built-in compile-time checking to be
120 smarter.
121
122 - The old STBI_SIMD system which allowed installing a user-defined
123 IDCT etc. has been removed. If you need this, don't upgrade. My
124 assumption is that almost nobody was doing this, and those who
125 were will find the built-in SIMD more satisfactory anyway.
126
127 - RGB values computed for JPEG images are slightly different from
128 previous versions of stb_image. (This is due to using less
129 integer precision in SIMD.) The C code has been adjusted so
130 that the same RGB values will be computed regardless of whether
131 SIMD support is available, so your app should always produce
132 consistent results. But these results are slightly different from
133 previous versions. (Specifically, about 3% of available YCbCr values
134 will compute different RGB results from pre-1.49 versions by +-1;
135 most of the deviating values are one smaller in the G channel.)
136
137 - If you must produce consistent results with previous versions of
138 stb_image, #define STBI_JPEG_OLD and you will get the same results
139 you used to; however, you will not get the SIMD speedups for
140 the YCbCr-to-RGB conversion step (although you should still see
141 significant JPEG speedup from the other changes).
142
143 Please note that STBI_JPEG_OLD is a temporary feature; it will be
144 removed in future versions of the library. It is only intended for
145 near-term back-compatibility use.
146
147
148 Latest revision history:
149 2.13 (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
150 2.12 (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
151 2.11 (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
152 RGB-format JPEG; remove white matting in PSD;
153 allocate large structures on the stack;
154 correct channel count for PNG & BMP
155 2.10 (2016-01-22) avoid warning introduced in 2.09
156 2.09 (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
157 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
158 2.07 (2015-09-13) partial animated GIF support
159 limited 16-bit PSD support
160 minor bugs, code cleanup, and compiler warnings
161
162 See end of file for full revision history.
163
164
165 ============================ Contributors =========================
166
167 Image formats Extensions, features
168 Sean Barrett (jpeg, png, bmp) Jetro Lauha (stbi_info)
169 Nicolas Schulz (hdr, psd) Martin "SpartanJ" Golini (stbi_info)
170 Jonathan Dummer (tga) James "moose2000" Brown (iPhone PNG)
171 Jean-Marc Lienher (gif) Ben "Disch" Wenger (io callbacks)
172 Tom Seddon (pic) Omar Cornut (1/2/4-bit PNG)
173 Thatcher Ulrich (psd) Nicolas Guillemot (vertical flip)
174 Ken Miller (pgm, ppm) Richard Mitton (16-bit PSD)
175 github:urraka (animated gif) Junggon Kim (PNM comments)
176 Daniel Gibson (16-bit TGA)
177 socks-the-fox (16-bit TGA)
178 Optimizations & bugfixes
179 Fabian "ryg" Giesen
180 Arseny Kapoulkine
181
182 Bug & warning fixes
183 Marc LeBlanc David Woo Guillaume George Martins Mozeiko
184 Christpher Lloyd Martin Golini Jerry Jansson Joseph Thomson
185 Dave Moore Roy Eltham Hayaki Saito Phil Jordan
186 Won Chun Luke Graham Johan Duparc Nathan Reed
187 the Horde3D community Thomas Ruf Ronny Chevalier Nick Verigakis
188 Janez Zemva John Bartholomew Michal Cichon github:svdijk
189 Jonathan Blow Ken Hamada Tero Hanninen Baldur Karlsson
190 Laurent Gomila Cort Stratton Sergio Gonzalez github:romigrou
191 Aruelien Pocheville Thibault Reuille Cass Everitt Matthew Gregan
192 Ryamond Barbiero Paul Du Bois Engin Manap github:snagar
193 Michaelangel007@github Oriol Ferrer Mesia Dale Weiler github:Zelex
194 Philipp Wiesemann Josh Tobin github:rlyeh github:grim210@github
195 Blazej Dariusz Roszkowski github:sammyhw
196
197
198 LICENSE
199
200 This software is dual-licensed to the public domain and under the following
201 license: you are granted a perpetual, irrevocable license to copy, modify,
202 publish, and distribute this file as you see fit.
203
204 */
205
206 #ifndef STBI_INCLUDE_STB_IMAGE_H
207 #define STBI_INCLUDE_STB_IMAGE_H
208
209 // DOCUMENTATION
210 //
211 // Limitations:
212 // - no 16-bit-per-channel PNG
213 // - no 12-bit-per-channel JPEG
214 // - no JPEGs with arithmetic coding
215 // - no 1-bit BMP
216 // - GIF always returns *comp=4
217 //
218 // Basic usage (see HDR discussion below for HDR usage):
219 // int x,y,n;
220 // unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
221 // // ... process data if not NULL ...
222 // // ... x = width, y = height, n = # 8-bit components per pixel ...
223 // // ... replace '0' with '1'..'4' to force that many components per pixel
224 // // ... but 'n' will always be the number that it would have been if you said 0
225 // stbi_image_free(data)
226 //
227 // Standard parameters:
228 // int *x -- outputs image width in pixels
229 // int *y -- outputs image height in pixels
230 // int *channels_in_file -- outputs # of image components in image file
231 // int desired_channels -- if non-zero, # of image components requested in result
232 //
233 // The return value from an image loader is an 'unsigned char *' which points
234 // to the pixel data, or NULL on an allocation failure or if the image is
235 // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
236 // with each pixel consisting of N interleaved 8-bit components; the first
237 // pixel pointed to is top-left-most in the image. There is no padding between
238 // image scanlines or between pixels, regardless of format. The number of
239 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise.
240 // If req_comp is non-zero, *comp has the number of components that _would_
241 // have been output otherwise. E.g. if you set req_comp to 4, you will always
242 // get RGBA output, but you can check *comp to see if it's trivially opaque
243 // because e.g. there were only 3 channels in the source image.
244 //
245 // An output image with N components has the following components interleaved
246 // in this order in each pixel:
247 //
248 // N=#comp components
249 // 1 grey
250 // 2 grey, alpha
251 // 3 red, green, blue
252 // 4 red, green, blue, alpha
253 //
254 // If image loading fails for any reason, the return value will be NULL,
255 // and *x, *y, *comp will be unchanged. The function stbi_failure_reason()
256 // can be queried for an extremely brief, end-user unfriendly explanation
257 // of why the load failed. Define STBI_NO_FAILURE_STRINGS to avoid
258 // compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
259 // more user-friendly ones.
260 //
261 // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
262 //
263 // ===========================================================================
264 //
265 // Philosophy
266 //
267 // stb libraries are designed with the following priorities:
268 //
269 // 1. easy to use
270 // 2. easy to maintain
271 // 3. good performance
272 //
273 // Sometimes I let "good performance" creep up in priority over "easy to maintain",
274 // and for best performance I may provide less-easy-to-use APIs that give higher
275 // performance, in addition to the easy to use ones. Nevertheless, it's important
276 // to keep in mind that from the standpoint of you, a client of this library,
277 // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all.
278 //
279 // Some secondary priorities arise directly from the first two, some of which
280 // make more explicit reasons why performance can't be emphasized.
281 //
282 // - Portable ("ease of use")
283 // - Small footprint ("easy to maintain")
284 // - No dependencies ("ease of use")
285 //
286 // ===========================================================================
287 //
288 // I/O callbacks
289 //
290 // I/O callbacks allow you to read from arbitrary sources, like packaged
291 // files or some other source. Data read from callbacks are processed
292 // through a small internal buffer (currently 128 bytes) to try to reduce
293 // overhead.
294 //
295 // The three functions you must define are "read" (reads some bytes of data),
296 // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
297 //
298 // ===========================================================================
299 //
300 // SIMD support
301 //
302 // The JPEG decoder will try to automatically use SIMD kernels on x86 when
303 // supported by the compiler. For ARM Neon support, you must explicitly
304 // request it.
305 //
306 // (The old do-it-yourself SIMD API is no longer supported in the current
307 // code.)
308 //
309 // On x86, SSE2 will automatically be used when available based on a run-time
310 // test; if not, the generic C versions are used as a fall-back. On ARM targets,
311 // the typical path is to have separate builds for NEON and non-NEON devices
312 // (at least this is true for iOS and Android). Therefore, the NEON support is
313 // toggled by a build flag: define STBI_NEON to get NEON loops.
314 //
315 // The output of the JPEG decoder is slightly different from versions where
316 // SIMD support was introduced (that is, for versions before 1.49). The
317 // difference is only +-1 in the 8-bit RGB channels, and only on a small
318 // fraction of pixels. You can force the pre-1.49 behavior by defining
319 // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path
320 // and hence cost some performance.
321 //
322 // If for some reason you do not want to use any of SIMD code, or if
323 // you have issues compiling it, you can disable it entirely by
324 // defining STBI_NO_SIMD.
325 //
326 // ===========================================================================
327 //
328 // HDR image support (disable by defining STBI_NO_HDR)
329 //
330 // stb_image now supports loading HDR images in general, and currently
331 // the Radiance .HDR file format, although the support is provided
332 // generically. You can still load any file through the existing interface;
333 // if you attempt to load an HDR file, it will be automatically remapped to
334 // LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
335 // both of these constants can be reconfigured through this interface:
336 //
337 // stbi_hdr_to_ldr_gamma(2.2f);
338 // stbi_hdr_to_ldr_scale(1.0f);
339 //
340 // (note, do not use _inverse_ constants; stbi_image will invert them
341 // appropriately).
342 //
343 // Additionally, there is a new, parallel interface for loading files as
344 // (linear) floats to preserve the full dynamic range:
345 //
346 // float *data = stbi_loadf(filename, &x, &y, &n, 0);
347 //
348 // If you load LDR images through this interface, those images will
349 // be promoted to floating point values, run through the inverse of
350 // constants corresponding to the above:
351 //
352 // stbi_ldr_to_hdr_scale(1.0f);
353 // stbi_ldr_to_hdr_gamma(2.2f);
354 //
355 // Finally, given a filename (or an open file or memory block--see header
356 // file for details) containing image data, you can query for the "most
357 // appropriate" interface to use (that is, whether the image is HDR or
358 // not), using:
359 //
360 // stbi_is_hdr(char *filename);
361 //
362 // ===========================================================================
363 //
364 // iPhone PNG support:
365 //
366 // By default we convert iphone-formatted PNGs back to RGB, even though
367 // they are internally encoded differently. You can disable this conversion
368 // by by calling stbi_convert_iphone_png_to_rgb(0), in which case
369 // you will always just get the native iphone "format" through (which
370 // is BGR stored in RGB).
371 //
372 // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
373 // pixel to remove any premultiplied alpha *only* if the image file explicitly
374 // says there's premultiplied data (currently only happens in iPhone images,
375 // and only if iPhone convert-to-rgb processing is on).
376 //
377
378
379 #ifndef STBI_NO_STDIO
380 #include <stdio.h>
381 #endif // STBI_NO_STDIO
382
383 #define STBI_VERSION 1
384
385 enum
386 {
387 STBI_default = 0, // only used for req_comp
388
389 STBI_grey = 1,
390 STBI_grey_alpha = 2,
391 STBI_rgb = 3,
392 STBI_rgb_alpha = 4
393 };
394
395 typedef unsigned char stbi_uc;
396 typedef unsigned short stbi_us;
397
398 #ifdef __cplusplus
399 extern "C" {
400 #endif
401
402 #ifdef STB_IMAGE_STATIC
403 #define STBIDEF static
404 #else
405 #define STBIDEF extern
406 #endif
407
408 //////////////////////////////////////////////////////////////////////////////
409 //
410 // PRIMARY API - works on images of any type
411 //
412
413 //
414 // load image by filename, open file, or memory buffer
415 //
416
417 typedef struct
418 {
419 int (*read) (void *user,char *data,int size); // fill 'data' with 'size' bytes. return number of bytes actually read
420 void (*skip) (void *user,int n); // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
421 int (*eof) (void *user); // returns nonzero if we are at end of file/data
422 } stbi_io_callbacks;
423
424 ////////////////////////////////////
425 //
426 // 8-bits-per-channel interface
427 //
428
429 STBIDEF stbi_uc *stbi_load (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
430 STBIDEF stbi_uc *stbi_load_from_memory (stbi_uc const *buffer, int len , int *x, int *y, int *channels_in_file, int desired_channels);
431 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
432
433 #ifndef STBI_NO_STDIO
434 STBIDEF stbi_uc *stbi_load_from_file (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
435 // for stbi_load_from_file, file pointer is left pointing immediately after image
436 #endif
437
438 ////////////////////////////////////
439 //
440 // 16-bits-per-channel interface
441 //
442
443 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
444 #ifndef STBI_NO_STDIO
445 STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
446 #endif
447 // @TODO the other variants
448
449 ////////////////////////////////////
450 //
451 // float-per-channel interface
452 //
453 #ifndef STBI_NO_LINEAR
454 STBIDEF float *stbi_loadf (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
455 STBIDEF float *stbi_loadf_from_memory (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
456 STBIDEF float *stbi_loadf_from_callbacks (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
457
458 #ifndef STBI_NO_STDIO
459 STBIDEF float *stbi_loadf_from_file (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
460 #endif
461 #endif
462
463 #ifndef STBI_NO_HDR
464 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma);
465 STBIDEF void stbi_hdr_to_ldr_scale(float scale);
466 #endif // STBI_NO_HDR
467
468 #ifndef STBI_NO_LINEAR
469 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma);
470 STBIDEF void stbi_ldr_to_hdr_scale(float scale);
471 #endif // STBI_NO_LINEAR
472
473 // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
474 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
475 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
476 #ifndef STBI_NO_STDIO
477 STBIDEF int stbi_is_hdr (char const *filename);
478 STBIDEF int stbi_is_hdr_from_file(FILE *f);
479 #endif // STBI_NO_STDIO
480
481
482 // get a VERY brief reason for failure
483 // NOT THREADSAFE
484 STBIDEF const char *stbi_failure_reason (void);
485
486 // free the loaded image -- this is just free()
487 STBIDEF void stbi_image_free (void *retval_from_stbi_load);
488
489 // get image dimensions & components without fully decoding
490 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
491 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
492
493 #ifndef STBI_NO_STDIO
494 STBIDEF int stbi_info (char const *filename, int *x, int *y, int *comp);
495 STBIDEF int stbi_info_from_file (FILE *f, int *x, int *y, int *comp);
496
497 #endif
498
499
500
501 // for image formats that explicitly notate that they have premultiplied alpha,
502 // we just return the colors as stored in the file. set this flag to force
503 // unpremultiplication. results are undefined if the unpremultiply overflow.
504 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
505
506 // indicate whether we should process iphone images back to canonical format,
507 // or just pass them through "as-is"
508 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
509
510 // flip the image vertically, so the first pixel in the output array is the bottom left
511 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
512
513 // ZLIB client - used by PNG, available for other purposes
514
515 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
516 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
517 STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
518 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
519
520 STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
521 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
522
523
524 #ifdef __cplusplus
525 }
526 #endif
527
528 //
529 //
530 //// end header file /////////////////////////////////////////////////////
531 #endif // STBI_INCLUDE_STB_IMAGE_H
532
533 #ifdef STB_IMAGE_IMPLEMENTATION
534
535 #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
536 || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
537 || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
538 || defined(STBI_ONLY_ZLIB)
539 #ifndef STBI_ONLY_JPEG
540 #define STBI_NO_JPEG
541 #endif
542 #ifndef STBI_ONLY_PNG
543 #define STBI_NO_PNG
544 #endif
545 #ifndef STBI_ONLY_BMP
546 #define STBI_NO_BMP
547 #endif
548 #ifndef STBI_ONLY_PSD
549 #define STBI_NO_PSD
550 #endif
551 #ifndef STBI_ONLY_TGA
552 #define STBI_NO_TGA
553 #endif
554 #ifndef STBI_ONLY_GIF
555 #define STBI_NO_GIF
556 #endif
557 #ifndef STBI_ONLY_HDR
558 #define STBI_NO_HDR
559 #endif
560 #ifndef STBI_ONLY_PIC
561 #define STBI_NO_PIC
562 #endif
563 #ifndef STBI_ONLY_PNM
564 #define STBI_NO_PNM
565 #endif
566 #endif
567
568 #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
569 #define STBI_NO_ZLIB
570 #endif
571
572
573 #include <stdarg.h>
574 #include <stddef.h> // ptrdiff_t on osx
575 #include <stdlib.h>
576 #include <string.h>
577 #include <limits.h>
578
579 #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
580 #include <math.h> // ldexp
581 #endif
582
583 #ifndef STBI_NO_STDIO
584 #include <stdio.h>
585 #endif
586
587 #ifndef STBI_ASSERT
588 #include <assert.h>
589 #define STBI_ASSERT(x) assert(x)
590 #endif
591
592
593 #ifndef _MSC_VER
594 #ifdef __cplusplus
595 #define stbi_inline inline
596 #else
597 #define stbi_inline
598 #endif
599 #else
600 #define stbi_inline __forceinline
601 #endif
602
603
604 #ifdef _MSC_VER
605 typedef unsigned short stbi__uint16;
606 typedef signed short stbi__int16;
607 typedef unsigned int stbi__uint32;
608 typedef signed int stbi__int32;
609 #else
610 #include <stdint.h>
611 typedef uint16_t stbi__uint16;
612 typedef int16_t stbi__int16;
613 typedef uint32_t stbi__uint32;
614 typedef int32_t stbi__int32;
615 #endif
616
617 // should produce compiler error if size is wrong
618 typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
619
620 #ifdef _MSC_VER
621 #define STBI_NOTUSED(v) (void)(v)
622 #else
623 #define STBI_NOTUSED(v) (void)sizeof(v)
624 #endif
625
626 #ifdef _MSC_VER
627 #define STBI_HAS_LROTL
628 #endif
629
630 #ifdef STBI_HAS_LROTL
631 #define stbi_lrot(x,y) _lrotl(x,y)
632 #else
633 #define stbi_lrot(x,y) (((x) << (y)) | ((x) >> (32 - (y))))
634 #endif
635
636 #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
637 // ok
638 #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
639 // ok
640 #else
641 #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
642 #endif
643
644 #ifndef STBI_MALLOC
645 #define STBI_MALLOC(sz) malloc(sz)
646 #define STBI_REALLOC(p,newsz) realloc(p,newsz)
647 #define STBI_FREE(p) free(p)
648 #endif
649
650 #ifndef STBI_REALLOC_SIZED
651 #define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
652 #endif
653
654 // x86/x64 detection
655 #if defined(__x86_64__) || defined(_M_X64)
656 #define STBI__X64_TARGET
657 #elif defined(__i386) || defined(_M_IX86)
658 #define STBI__X86_TARGET
659 #endif
660
661 #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
662 // NOTE: not clear do we actually need this for the 64-bit path?
663 // gcc doesn't support sse2 intrinsics unless you compile with -msse2,
664 // (but compiling with -msse2 allows the compiler to use SSE2 everywhere;
665 // this is just broken and gcc are jerks for not fixing it properly
666 // http://www.virtualdub.org/blog/pivot/entry.php?id=363 )
667 #define STBI_NO_SIMD
668 #endif
669
670 #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
671 // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
672 //
673 // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
674 // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
675 // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
676 // simultaneously enabling "-mstackrealign".
677 //
678 // See https://github.com/nothings/stb/issues/81 for more information.
679 //
680 // So default to no SSE2 on 32-bit MinGW. If you've read this far and added
681 // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
682 #define STBI_NO_SIMD
683 #endif
684
685 #if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
686 #define STBI_SSE2
687 #include <emmintrin.h>
688
689 #ifdef _MSC_VER
690
691 #if _MSC_VER >= 1400 // not VC6
692 #include <intrin.h> // __cpuid
693 static int stbi__cpuid3(void)
694 {
695 int info[4];
696 __cpuid(info,1);
697 return info[3];
698 }
699 #else
700 static int stbi__cpuid3(void)
701 {
702 int res;
703 __asm {
704 mov eax,1
705 cpuid
706 mov res,edx
707 }
708 return res;
709 }
710 #endif
711
712 #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
713
714 static int stbi__sse2_available()
715 {
716 int info3 = stbi__cpuid3();
717 return ((info3 >> 26) & 1) != 0;
718 }
719 #else // assume GCC-style if not VC++
720 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
721
722 static int stbi__sse2_available()
723 {
724 #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later
725 // GCC 4.8+ has a nice way to do this
726 return __builtin_cpu_supports("sse2");
727 #else
728 // portable way to do this, preferably without using GCC inline ASM?
729 // just bail for now.
730 return 0;
731 #endif
732 }
733 #endif
734 #endif
735
736 // ARM NEON
737 #if defined(STBI_NO_SIMD) && defined(STBI_NEON)
738 #undef STBI_NEON
739 #endif
740
741 #ifdef STBI_NEON
742 #include <arm_neon.h>
743 // assume GCC or Clang on ARM targets
744 #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
745 #endif
746
747 #ifndef STBI_SIMD_ALIGN
748 #define STBI_SIMD_ALIGN(type, name) type name
749 #endif
750
751 ///////////////////////////////////////////////
752 //
753 // stbi__context struct and start_xxx functions
754
755 // stbi__context structure is our basic context used by all images, so it
756 // contains all the IO context, plus some basic image information
757 typedef struct
758 {
759 stbi__uint32 img_x, img_y;
760 int img_n, img_out_n;
761
762 stbi_io_callbacks io;
763 void *io_user_data;
764
765 int read_from_callbacks;
766 int buflen;
767 stbi_uc buffer_start[128];
768
769 stbi_uc *img_buffer, *img_buffer_end;
770 stbi_uc *img_buffer_original, *img_buffer_original_end;
771 } stbi__context;
772
773
774 static void stbi__refill_buffer(stbi__context *s);
775
776 // initialize a memory-decode context
777 static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
778 {
779 s->io.read = NULL;
780 s->read_from_callbacks = 0;
781 s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
782 s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
783 }
784
785 // initialize a callback-based context
786 static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
787 {
788 s->io = *c;
789 s->io_user_data = user;
790 s->buflen = sizeof(s->buffer_start);
791 s->read_from_callbacks = 1;
792 s->img_buffer_original = s->buffer_start;
793 stbi__refill_buffer(s);
794 s->img_buffer_original_end = s->img_buffer_end;
795 }
796
797 #ifndef STBI_NO_STDIO
798
799 static int stbi__stdio_read(void *user, char *data, int size)
800 {
801 return (int) fread(data,1,size,(FILE*) user);
802 }
803
804 static void stbi__stdio_skip(void *user, int n)
805 {
806 fseek((FILE*) user, n, SEEK_CUR);
807 }
808
809 static int stbi__stdio_eof(void *user)
810 {
811 return feof((FILE*) user);
812 }
813
814 static stbi_io_callbacks stbi__stdio_callbacks =
815 {
816 stbi__stdio_read,
817 stbi__stdio_skip,
818 stbi__stdio_eof,
819 };
820
821 static void stbi__start_file(stbi__context *s, FILE *f)
822 {
823 stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
824 }
825
826 //static void stop_file(stbi__context *s) { }
827
828 #endif // !STBI_NO_STDIO
829
830 static void stbi__rewind(stbi__context *s)
831 {
832 // conceptually rewind SHOULD rewind to the beginning of the stream,
833 // but we just rewind to the beginning of the initial buffer, because
834 // we only use it after doing 'test', which only ever looks at at most 92 bytes
835 s->img_buffer = s->img_buffer_original;
836 s->img_buffer_end = s->img_buffer_original_end;
837 }
838
839 enum
840 {
841 STBI_ORDER_RGB,
842 STBI_ORDER_BGR
843 };
844
845 typedef struct
846 {
847 int bits_per_channel;
848 int num_channels;
849 int channel_order;
850 } stbi__result_info;
851
852 #ifndef STBI_NO_JPEG
853 static int stbi__jpeg_test(stbi__context *s);
854 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
855 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
856 #endif
857
858 #ifndef STBI_NO_PNG
859 static int stbi__png_test(stbi__context *s);
860 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
861 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
862 #endif
863
864 #ifndef STBI_NO_BMP
865 static int stbi__bmp_test(stbi__context *s);
866 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
867 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
868 #endif
869
870 #ifndef STBI_NO_TGA
871 static int stbi__tga_test(stbi__context *s);
872 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
873 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
874 #endif
875
876 #ifndef STBI_NO_PSD
877 static int stbi__psd_test(stbi__context *s);
878 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
879 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
880 #endif
881
882 #ifndef STBI_NO_HDR
883 static int stbi__hdr_test(stbi__context *s);
884 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
885 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
886 #endif
887
888 #ifndef STBI_NO_PIC
889 static int stbi__pic_test(stbi__context *s);
890 static void *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
891 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
892 #endif
893
894 #ifndef STBI_NO_GIF
895 static int stbi__gif_test(stbi__context *s);
896 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
897 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
898 #endif
899
900 #ifndef STBI_NO_PNM
901 static int stbi__pnm_test(stbi__context *s);
902 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
903 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
904 #endif
905
906 // this is not threadsafe
907 static const char *stbi__g_failure_reason;
908
909 STBIDEF const char *stbi_failure_reason(void)
910 {
911 return stbi__g_failure_reason;
912 }
913
914 static int stbi__err(const char *str)
915 {
916 stbi__g_failure_reason = str;
917 return 0;
918 }
919
920 static void *stbi__malloc(size_t size)
921 {
922 return STBI_MALLOC(size);
923 }
924
925 // stb_image uses ints pervasively, including for offset calculations.
926 // therefore the largest decoded image size we can support with the
927 // current code, even on 64-bit targets, is INT_MAX. this is not a
928 // significant limitation for the intended use case.
929 //
930 // we do, however, need to make sure our size calculations don't
931 // overflow. hence a few helper functions for size calculations that
932 // multiply integers together, making sure that they're non-negative
933 // and no overflow occurs.
934
935 // return 1 if the sum is valid, 0 on overflow.
936 // negative terms are considered invalid.
937 static int stbi__addsizes_valid(int a, int b)
938 {
939 if (b < 0) return 0;
940 // now 0 <= b <= INT_MAX, hence also
941 // 0 <= INT_MAX - b <= INTMAX.
942 // And "a + b <= INT_MAX" (which might overflow) is the
943 // same as a <= INT_MAX - b (no overflow)
944 return a <= INT_MAX - b;
945 }
946
947 // returns 1 if the product is valid, 0 on overflow.
948 // negative factors are considered invalid.
949 static int stbi__mul2sizes_valid(int a, int b)
950 {
951 if (a < 0 || b < 0) return 0;
952 if (b == 0) return 1; // mul-by-0 is always safe
953 // portable way to check for no overflows in a*b
954 return a <= INT_MAX/b;
955 }
956
957 // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
958 static int stbi__mad2sizes_valid(int a, int b, int add)
959 {
960 return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
961 }
962
963 // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
964 static int stbi__mad3sizes_valid(int a, int b, int c, int add)
965 {
966 return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
967 stbi__addsizes_valid(a*b*c, add);
968 }
969
970 // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
971 static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
972 {
973 return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
974 stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
975 }
976
977 // mallocs with size overflow checking
978 static void *stbi__malloc_mad2(int a, int b, int add)
979 {
980 if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
981 return stbi__malloc(a*b + add);
982 }
983
984 static void *stbi__malloc_mad3(int a, int b, int c, int add)
985 {
986 if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
987 return stbi__malloc(a*b*c + add);
988 }
989
990 static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
991 {
992 if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
993 return stbi__malloc(a*b*c*d + add);
994 }
995
996 // stbi__err - error
997 // stbi__errpf - error returning pointer to float
998 // stbi__errpuc - error returning pointer to unsigned char
999
1000 #ifdef STBI_NO_FAILURE_STRINGS
1001 #define stbi__err(x,y) 0
1002 #elif defined(STBI_FAILURE_USERMSG)
1003 #define stbi__err(x,y) stbi__err(y)
1004 #else
1005 #define stbi__err(x,y) stbi__err(x)
1006 #endif
1007
1008 #define stbi__errpf(x,y) ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
1009 #define stbi__errpuc(x,y) ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
1010
1011 STBIDEF void stbi_image_free(void *retval_from_stbi_load)
1012 {
1013 STBI_FREE(retval_from_stbi_load);
1014 }
1015
1016 #ifndef STBI_NO_LINEAR
1017 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
1018 #endif
1019
1020 #ifndef STBI_NO_HDR
1021 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp);
1022 #endif
1023
1024 static int stbi__vertically_flip_on_load = 0;
1025
1026 STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
1027 {
1028 stbi__vertically_flip_on_load = flag_true_if_should_flip;
1029 }
1030
1031 static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
1032 {
1033 memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
1034 ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
1035 ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
1036 ri->num_channels = 0;
1037
1038 #ifndef STBI_NO_JPEG
1039 if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
1040 #endif
1041 #ifndef STBI_NO_PNG
1042 if (stbi__png_test(s)) return stbi__png_load(s,x,y,comp,req_comp, ri);
1043 #endif
1044 #ifndef STBI_NO_BMP
1045 if (stbi__bmp_test(s)) return stbi__bmp_load(s,x,y,comp,req_comp, ri);
1046 #endif
1047 #ifndef STBI_NO_GIF
1048 if (stbi__gif_test(s)) return stbi__gif_load(s,x,y,comp,req_comp, ri);
1049 #endif
1050 #ifndef STBI_NO_PSD
1051 if (stbi__psd_test(s)) return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
1052 #endif
1053 #ifndef STBI_NO_PIC
1054 if (stbi__pic_test(s)) return stbi__pic_load(s,x,y,comp,req_comp, ri);
1055 #endif
1056 #ifndef STBI_NO_PNM
1057 if (stbi__pnm_test(s)) return stbi__pnm_load(s,x,y,comp,req_comp, ri);
1058 #endif
1059
1060 #ifndef STBI_NO_HDR
1061 if (stbi__hdr_test(s)) {
1062 float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
1063 return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
1064 }
1065 #endif
1066
1067 #ifndef STBI_NO_TGA
1068 // test tga last because it's a crappy test!
1069 if (stbi__tga_test(s))
1070 return stbi__tga_load(s,x,y,comp,req_comp, ri);
1071 #endif
1072
1073 return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
1074 }
1075
1076 static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
1077 {
1078 int i;
1079 int img_len = w * h * channels;
1080 stbi_uc *reduced;
1081
1082 reduced = (stbi_uc *) stbi__malloc(img_len);
1083 if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
1084
1085 for (i = 0; i < img_len; ++i)
1086 reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
1087
1088 STBI_FREE(orig);
1089 return reduced;
1090 }
1091
1092 static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
1093 {
1094 int i;
1095 int img_len = w * h * channels;
1096 stbi__uint16 *enlarged;
1097
1098 enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
1099 if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1100
1101 for (i = 0; i < img_len; ++i)
1102 enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
1103
1104 STBI_FREE(orig);
1105 return enlarged;
1106 }
1107
1108 static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1109 {
1110 stbi__result_info ri;
1111 void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
1112
1113 if (result == NULL)
1114 return NULL;
1115
1116 if (ri.bits_per_channel != 8) {
1117 STBI_ASSERT(ri.bits_per_channel == 16);
1118 result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1119 ri.bits_per_channel = 8;
1120 }
1121
1122 // @TODO: move stbi__convert_format to here
1123
1124 if (stbi__vertically_flip_on_load) {
1125 int w = *x, h = *y;
1126 int channels = req_comp ? req_comp : *comp;
1127 int row,col,z;
1128 stbi_uc *image = (stbi_uc *) result;
1129
1130 // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1131 for (row = 0; row < (h>>1); row++) {
1132 for (col = 0; col < w; col++) {
1133 for (z = 0; z < channels; z++) {
1134 stbi_uc temp = image[(row * w + col) * channels + z];
1135 image[(row * w + col) * channels + z] = image[((h - row - 1) * w + col) * channels + z];
1136 image[((h - row - 1) * w + col) * channels + z] = temp;
1137 }
1138 }
1139 }
1140 }
1141
1142 return (unsigned char *) result;
1143 }
1144
1145 static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1146 {
1147 stbi__result_info ri;
1148 void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
1149
1150 if (result == NULL)
1151 return NULL;
1152
1153 if (ri.bits_per_channel != 16) {
1154 STBI_ASSERT(ri.bits_per_channel == 8);
1155 result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
1156 ri.bits_per_channel = 16;
1157 }
1158
1159 // @TODO: move stbi__convert_format16 to here
1160 // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
1161
1162 if (stbi__vertically_flip_on_load) {
1163 int w = *x, h = *y;
1164 int channels = req_comp ? req_comp : *comp;
1165 int row,col,z;
1166 stbi__uint16 *image = (stbi__uint16 *) result;
1167
1168 // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1169 for (row = 0; row < (h>>1); row++) {
1170 for (col = 0; col < w; col++) {
1171 for (z = 0; z < channels; z++) {
1172 stbi__uint16 temp = image[(row * w + col) * channels + z];
1173 image[(row * w + col) * channels + z] = image[((h - row - 1) * w + col) * channels + z];
1174 image[((h - row - 1) * w + col) * channels + z] = temp;
1175 }
1176 }
1177 }
1178 }
1179
1180 return (stbi__uint16 *) result;
1181 }
1182
1183 #ifndef STBI_NO_HDR
1184 static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
1185 {
1186 if (stbi__vertically_flip_on_load && result != NULL) {
1187 int w = *x, h = *y;
1188 int depth = req_comp ? req_comp : *comp;
1189 int row,col,z;
1190 float temp;
1191
1192 // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once
1193 for (row = 0; row < (h>>1); row++) {
1194 for (col = 0; col < w; col++) {
1195 for (z = 0; z < depth; z++) {
1196 temp = result[(row * w + col) * depth + z];
1197 result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z];
1198 result[((h - row - 1) * w + col) * depth + z] = temp;
1199 }
1200 }
1201 }
1202 }
1203 }
1204 #endif
1205
1206 #ifndef STBI_NO_STDIO
1207
1208 static FILE *stbi__fopen(char const *filename, char const *mode)
1209 {
1210 FILE *f;
1211 #if defined(_MSC_VER) && _MSC_VER >= 1400
1212 if (0 != fopen_s(&f, filename, mode))
1213 f=0;
1214 #else
1215 f = fopen(filename, mode);
1216 #endif
1217 return f;
1218 }
1219
1220
1221 STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
1222 {
1223 FILE *f = stbi__fopen(filename, "rb");
1224 unsigned char *result;
1225 if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
1226 result = stbi_load_from_file(f,x,y,comp,req_comp);
1227 fclose(f);
1228 return result;
1229 }
1230
1231 STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1232 {
1233 unsigned char *result;
1234 stbi__context s;
1235 stbi__start_file(&s,f);
1236 result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1237 if (result) {
1238 // need to 'unget' all the characters in the IO buffer
1239 fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1240 }
1241 return result;
1242 }
1243
1244 STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
1245 {
1246 stbi__uint16 *result;
1247 stbi__context s;
1248 stbi__start_file(&s,f);
1249 result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
1250 if (result) {
1251 // need to 'unget' all the characters in the IO buffer
1252 fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
1253 }
1254 return result;
1255 }
1256
1257 STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
1258 {
1259 FILE *f = stbi__fopen(filename, "rb");
1260 stbi__uint16 *result;
1261 if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
1262 result = stbi_load_from_file_16(f,x,y,comp,req_comp);
1263 fclose(f);
1264 return result;
1265 }
1266
1267
1268 #endif //!STBI_NO_STDIO
1269
1270 STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1271 {
1272 stbi__context s;
1273 stbi__start_mem(&s,buffer,len);
1274 return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1275 }
1276
1277 STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1278 {
1279 stbi__context s;
1280 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1281 return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
1282 }
1283
1284 #ifndef STBI_NO_LINEAR
1285 static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
1286 {
1287 unsigned char *data;
1288 #ifndef STBI_NO_HDR
1289 if (stbi__hdr_test(s)) {
1290 stbi__result_info ri;
1291 float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
1292 if (hdr_data)
1293 stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
1294 return hdr_data;
1295 }
1296 #endif
1297 data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
1298 if (data)
1299 return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
1300 return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
1301 }
1302
1303 STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
1304 {
1305 stbi__context s;
1306 stbi__start_mem(&s,buffer,len);
1307 return stbi__loadf_main(&s,x,y,comp,req_comp);
1308 }
1309
1310 STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
1311 {
1312 stbi__context s;
1313 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1314 return stbi__loadf_main(&s,x,y,comp,req_comp);
1315 }
1316
1317 #ifndef STBI_NO_STDIO
1318 STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
1319 {
1320 float *result;
1321 FILE *f = stbi__fopen(filename, "rb");
1322 if (!f) return stbi__errpf("can't fopen", "Unable to open file");
1323 result = stbi_loadf_from_file(f,x,y,comp,req_comp);
1324 fclose(f);
1325 return result;
1326 }
1327
1328 STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
1329 {
1330 stbi__context s;
1331 stbi__start_file(&s,f);
1332 return stbi__loadf_main(&s,x,y,comp,req_comp);
1333 }
1334 #endif // !STBI_NO_STDIO
1335
1336 #endif // !STBI_NO_LINEAR
1337
1338 // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
1339 // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
1340 // reports false!
1341
1342 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
1343 {
1344 #ifndef STBI_NO_HDR
1345 stbi__context s;
1346 stbi__start_mem(&s,buffer,len);
1347 return stbi__hdr_test(&s);
1348 #else
1349 STBI_NOTUSED(buffer);
1350 STBI_NOTUSED(len);
1351 return 0;
1352 #endif
1353 }
1354
1355 #ifndef STBI_NO_STDIO
1356 STBIDEF int stbi_is_hdr (char const *filename)
1357 {
1358 FILE *f = stbi__fopen(filename, "rb");
1359 int result=0;
1360 if (f) {
1361 result = stbi_is_hdr_from_file(f);
1362 fclose(f);
1363 }
1364 return result;
1365 }
1366
1367 STBIDEF int stbi_is_hdr_from_file(FILE *f)
1368 {
1369 #ifndef STBI_NO_HDR
1370 stbi__context s;
1371 stbi__start_file(&s,f);
1372 return stbi__hdr_test(&s);
1373 #else
1374 STBI_NOTUSED(f);
1375 return 0;
1376 #endif
1377 }
1378 #endif // !STBI_NO_STDIO
1379
1380 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
1381 {
1382 #ifndef STBI_NO_HDR
1383 stbi__context s;
1384 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
1385 return stbi__hdr_test(&s);
1386 #else
1387 STBI_NOTUSED(clbk);
1388 STBI_NOTUSED(user);
1389 return 0;
1390 #endif
1391 }
1392
1393 #ifndef STBI_NO_LINEAR
1394 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
1395
1396 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
1397 STBIDEF void stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
1398 #endif
1399
1400 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
1401
1402 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
1403 STBIDEF void stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
1404
1405
1406 //////////////////////////////////////////////////////////////////////////////
1407 //
1408 // Common code used by all image loaders
1409 //
1410
1411 enum
1412 {
1413 STBI__SCAN_load=0,
1414 STBI__SCAN_type,
1415 STBI__SCAN_header
1416 };
1417
1418 static void stbi__refill_buffer(stbi__context *s)
1419 {
1420 int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
1421 if (n == 0) {
1422 // at end of file, treat same as if from memory, but need to handle case
1423 // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
1424 s->read_from_callbacks = 0;
1425 s->img_buffer = s->buffer_start;
1426 s->img_buffer_end = s->buffer_start+1;
1427 *s->img_buffer = 0;
1428 } else {
1429 s->img_buffer = s->buffer_start;
1430 s->img_buffer_end = s->buffer_start + n;
1431 }
1432 }
1433
1434 stbi_inline static stbi_uc stbi__get8(stbi__context *s)
1435 {
1436 if (s->img_buffer < s->img_buffer_end)
1437 return *s->img_buffer++;
1438 if (s->read_from_callbacks) {
1439 stbi__refill_buffer(s);
1440 return *s->img_buffer++;
1441 }
1442 return 0;
1443 }
1444
1445 stbi_inline static int stbi__at_eof(stbi__context *s)
1446 {
1447 if (s->io.read) {
1448 if (!(s->io.eof)(s->io_user_data)) return 0;
1449 // if feof() is true, check if buffer = end
1450 // special case: we've only got the special 0 character at the end
1451 if (s->read_from_callbacks == 0) return 1;
1452 }
1453
1454 return s->img_buffer >= s->img_buffer_end;
1455 }
1456
1457 static void stbi__skip(stbi__context *s, int n)
1458 {
1459 if (n < 0) {
1460 s->img_buffer = s->img_buffer_end;
1461 return;
1462 }
1463 if (s->io.read) {
1464 int blen = (int) (s->img_buffer_end - s->img_buffer);
1465 if (blen < n) {
1466 s->img_buffer = s->img_buffer_end;
1467 (s->io.skip)(s->io_user_data, n - blen);
1468 return;
1469 }
1470 }
1471 s->img_buffer += n;
1472 }
1473
1474 static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
1475 {
1476 if (s->io.read) {
1477 int blen = (int) (s->img_buffer_end - s->img_buffer);
1478 if (blen < n) {
1479 int res, count;
1480
1481 memcpy(buffer, s->img_buffer, blen);
1482
1483 count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
1484 res = (count == (n-blen));
1485 s->img_buffer = s->img_buffer_end;
1486 return res;
1487 }
1488 }
1489
1490 if (s->img_buffer+n <= s->img_buffer_end) {
1491 memcpy(buffer, s->img_buffer, n);
1492 s->img_buffer += n;
1493 return 1;
1494 } else
1495 return 0;
1496 }
1497
1498 static int stbi__get16be(stbi__context *s)
1499 {
1500 int z = stbi__get8(s);
1501 return (z << 8) + stbi__get8(s);
1502 }
1503
1504 static stbi__uint32 stbi__get32be(stbi__context *s)
1505 {
1506 stbi__uint32 z = stbi__get16be(s);
1507 return (z << 16) + stbi__get16be(s);
1508 }
1509
1510 #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
1511 // nothing
1512 #else
1513 static int stbi__get16le(stbi__context *s)
1514 {
1515 int z = stbi__get8(s);
1516 return z + (stbi__get8(s) << 8);
1517 }
1518 #endif
1519
1520 #ifndef STBI_NO_BMP
1521 static stbi__uint32 stbi__get32le(stbi__context *s)
1522 {
1523 stbi__uint32 z = stbi__get16le(s);
1524 return z + (stbi__get16le(s) << 16);
1525 }
1526 #endif
1527
1528 #define STBI__BYTECAST(x) ((stbi_uc) ((x) & 255)) // truncate int to byte without warnings
1529
1530
1531 //////////////////////////////////////////////////////////////////////////////
1532 //
1533 // generic converter from built-in img_n to req_comp
1534 // individual types do this automatically as much as possible (e.g. jpeg
1535 // does all cases internally since it needs to colorspace convert anyway,
1536 // and it never has alpha, so very few cases ). png can automatically
1537 // interleave an alpha=255 channel, but falls back to this for other cases
1538 //
1539 // assume data buffer is malloced, so malloc a new one and free that one
1540 // only failure mode is malloc failing
1541
1542 static stbi_uc stbi__compute_y(int r, int g, int b)
1543 {
1544 return (stbi_uc) (((r*77) + (g*150) + (29*b)) >> 8);
1545 }
1546
1547 static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1548 {
1549 int i,j;
1550 unsigned char *good;
1551
1552 if (req_comp == img_n) return data;
1553 STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1554
1555 good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
1556 if (good == NULL) {
1557 STBI_FREE(data);
1558 return stbi__errpuc("outofmem", "Out of memory");
1559 }
1560
1561 for (j=0; j < (int) y; ++j) {
1562 unsigned char *src = data + j * x * img_n ;
1563 unsigned char *dest = good + j * x * req_comp;
1564
1565 #define STBI__COMBO(a,b) ((a)*8+(b))
1566 #define STBI__CASE(a,b) case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1567 // convert source image with img_n components to one with req_comp components;
1568 // avoid switch per pixel, so use switch per scanline and massive macros
1569 switch (STBI__COMBO(img_n, req_comp)) {
1570 STBI__CASE(1,2) { dest[0]=src[0], dest[1]=255; } break;
1571 STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0]; } break;
1572 STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; } break;
1573 STBI__CASE(2,1) { dest[0]=src[0]; } break;
1574 STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0]; } break;
1575 STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; } break;
1576 STBI__CASE(3,4) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; } break;
1577 STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); } break;
1578 STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; } break;
1579 STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); } break;
1580 STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; } break;
1581 STBI__CASE(4,3) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; } break;
1582 default: STBI_ASSERT(0);
1583 }
1584 #undef STBI__CASE
1585 }
1586
1587 STBI_FREE(data);
1588 return good;
1589 }
1590
1591 static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
1592 {
1593 return (stbi__uint16) (((r*77) + (g*150) + (29*b)) >> 8);
1594 }
1595
1596 static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
1597 {
1598 int i,j;
1599 stbi__uint16 *good;
1600
1601 if (req_comp == img_n) return data;
1602 STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
1603
1604 good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
1605 if (good == NULL) {
1606 STBI_FREE(data);
1607 return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
1608 }
1609
1610 for (j=0; j < (int) y; ++j) {
1611 stbi__uint16 *src = data + j * x * img_n ;
1612 stbi__uint16 *dest = good + j * x * req_comp;
1613
1614 #define STBI__COMBO(a,b) ((a)*8+(b))
1615 #define STBI__CASE(a,b) case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
1616 // convert source image with img_n components to one with req_comp components;
1617 // avoid switch per pixel, so use switch per scanline and massive macros
1618 switch (STBI__COMBO(img_n, req_comp)) {
1619 STBI__CASE(1,2) { dest[0]=src[0], dest[1]=0xffff; } break;
1620 STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0]; } break;
1621 STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=0xffff; } break;
1622 STBI__CASE(2,1) { dest[0]=src[0]; } break;
1623 STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0]; } break;
1624 STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; } break;
1625 STBI__CASE(3,4) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=0xffff; } break;
1626 STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); } break;
1627 STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]), dest[1] = 0xffff; } break;
1628 STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); } break;
1629 STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]), dest[1] = src[3]; } break;
1630 STBI__CASE(4,3) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; } break;
1631 default: STBI_ASSERT(0);
1632 }
1633 #undef STBI__CASE
1634 }
1635
1636 STBI_FREE(data);
1637 return good;
1638 }
1639
1640 #ifndef STBI_NO_LINEAR
1641 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
1642 {
1643 int i,k,n;
1644 float *output;
1645 if (!data) return NULL;
1646 output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
1647 if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
1648 // compute number of non-alpha components
1649 if (comp & 1) n = comp; else n = comp-1;
1650 for (i=0; i < x*y; ++i) {
1651 for (k=0; k < n; ++k) {
1652 output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
1653 }
1654 if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f;
1655 }
1656 STBI_FREE(data);
1657 return output;
1658 }
1659 #endif
1660
1661 #ifndef STBI_NO_HDR
1662 #define stbi__float2int(x) ((int) (x))
1663 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp)
1664 {
1665 int i,k,n;
1666 stbi_uc *output;
1667 if (!data) return NULL;
1668 output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
1669 if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
1670 // compute number of non-alpha components
1671 if (comp & 1) n = comp; else n = comp-1;
1672 for (i=0; i < x*y; ++i) {
1673 for (k=0; k < n; ++k) {
1674 float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
1675 if (z < 0) z = 0;
1676 if (z > 255) z = 255;
1677 output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1678 }
1679 if (k < comp) {
1680 float z = data[i*comp+k] * 255 + 0.5f;
1681 if (z < 0) z = 0;
1682 if (z > 255) z = 255;
1683 output[i*comp + k] = (stbi_uc) stbi__float2int(z);
1684 }
1685 }
1686 STBI_FREE(data);
1687 return output;
1688 }
1689 #endif
1690
1691 //////////////////////////////////////////////////////////////////////////////
1692 //
1693 // "baseline" JPEG/JFIF decoder
1694 //
1695 // simple implementation
1696 // - doesn't support delayed output of y-dimension
1697 // - simple interface (only one output format: 8-bit interleaved RGB)
1698 // - doesn't try to recover corrupt jpegs
1699 // - doesn't allow partial loading, loading multiple at once
1700 // - still fast on x86 (copying globals into locals doesn't help x86)
1701 // - allocates lots of intermediate memory (full size of all components)
1702 // - non-interleaved case requires this anyway
1703 // - allows good upsampling (see next)
1704 // high-quality
1705 // - upsampled channels are bilinearly interpolated, even across blocks
1706 // - quality integer IDCT derived from IJG's 'slow'
1707 // performance
1708 // - fast huffman; reasonable integer IDCT
1709 // - some SIMD kernels for common paths on targets with SSE2/NEON
1710 // - uses a lot of intermediate memory, could cache poorly
1711
1712 #ifndef STBI_NO_JPEG
1713
1714 // huffman decoding acceleration
1715 #define FAST_BITS 9 // larger handles more cases; smaller stomps less cache
1716
1717 typedef struct
1718 {
1719 stbi_uc fast[1 << FAST_BITS];
1720 // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
1721 stbi__uint16 code[256];
1722 stbi_uc values[256];
1723 stbi_uc size[257];
1724 unsigned int maxcode[18];
1725 int delta[17]; // old 'firstsymbol' - old 'firstcode'
1726 } stbi__huffman;
1727
1728 typedef struct
1729 {
1730 stbi__context *s;
1731 stbi__huffman huff_dc[4];
1732 stbi__huffman huff_ac[4];
1733 stbi_uc dequant[4][64];
1734 stbi__int16 fast_ac[4][1 << FAST_BITS];
1735
1736 // sizes for components, interleaved MCUs
1737 int img_h_max, img_v_max;
1738 int img_mcu_x, img_mcu_y;
1739 int img_mcu_w, img_mcu_h;
1740
1741 // definition of jpeg image component
1742 struct
1743 {
1744 int id;
1745 int h,v;
1746 int tq;
1747 int hd,ha;
1748 int dc_pred;
1749
1750 int x,y,w2,h2;
1751 stbi_uc *data;
1752 void *raw_data, *raw_coeff;
1753 stbi_uc *linebuf;
1754 short *coeff; // progressive only
1755 int coeff_w, coeff_h; // number of 8x8 coefficient blocks
1756 } img_comp[4];
1757
1758 stbi__uint32 code_buffer; // jpeg entropy-coded buffer
1759 int code_bits; // number of valid bits
1760 unsigned char marker; // marker seen while filling entropy buffer
1761 int nomore; // flag if we saw a marker so must stop
1762
1763 int progressive;
1764 int spec_start;
1765 int spec_end;
1766 int succ_high;
1767 int succ_low;
1768 int eob_run;
1769 int rgb;
1770
1771 int scan_n, order[4];
1772 int restart_interval, todo;
1773
1774 // kernels
1775 void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
1776 void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
1777 stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
1778 } stbi__jpeg;
1779
1780 static int stbi__build_huffman(stbi__huffman *h, int *count)
1781 {
1782 int i,j,k=0,code;
1783 // build size list for each symbol (from JPEG spec)
1784 for (i=0; i < 16; ++i)
1785 for (j=0; j < count[i]; ++j)
1786 h->size[k++] = (stbi_uc) (i+1);
1787 h->size[k] = 0;
1788
1789 // compute actual symbols (from jpeg spec)
1790 code = 0;
1791 k = 0;
1792 for(j=1; j <= 16; ++j) {
1793 // compute delta to add to code to compute symbol id
1794 h->delta[j] = k - code;
1795 if (h->size[k] == j) {
1796 while (h->size[k] == j)
1797 h->code[k++] = (stbi__uint16) (code++);
1798 if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG");
1799 }
1800 // compute largest code + 1 for this size, preshifted as needed later
1801 h->maxcode[j] = code << (16-j);
1802 code <<= 1;
1803 }
1804 h->maxcode[j] = 0xffffffff;
1805
1806 // build non-spec acceleration table; 255 is flag for not-accelerated
1807 memset(h->fast, 255, 1 << FAST_BITS);
1808 for (i=0; i < k; ++i) {
1809 int s = h->size[i];
1810 if (s <= FAST_BITS) {
1811 int c = h->code[i] << (FAST_BITS-s);
1812 int m = 1 << (FAST_BITS-s);
1813 for (j=0; j < m; ++j) {
1814 h->fast[c+j] = (stbi_uc) i;
1815 }
1816 }
1817 }
1818 return 1;
1819 }
1820
1821 // build a table that decodes both magnitude and value of small ACs in
1822 // one go.
1823 static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
1824 {
1825 int i;
1826 for (i=0; i < (1 << FAST_BITS); ++i) {
1827 stbi_uc fast = h->fast[i];
1828 fast_ac[i] = 0;
1829 if (fast < 255) {
1830 int rs = h->values[fast];
1831 int run = (rs >> 4) & 15;
1832 int magbits = rs & 15;
1833 int len = h->size[fast];
1834
1835 if (magbits && len + magbits <= FAST_BITS) {
1836 // magnitude code followed by receive_extend code
1837 int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
1838 int m = 1 << (magbits - 1);
1839 if (k < m) k += (-1 << magbits) + 1;
1840 // if the result is small enough, we can fit it in fast_ac table
1841 if (k >= -128 && k <= 127)
1842 fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits));
1843 }
1844 }
1845 }
1846 }
1847
1848 static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
1849 {
1850 do {
1851 int b = j->nomore ? 0 : stbi__get8(j->s);
1852 if (b == 0xff) {
1853 int c = stbi__get8(j->s);
1854 if (c != 0) {
1855 j->marker = (unsigned char) c;
1856 j->nomore = 1;
1857 return;
1858 }
1859 }
1860 j->code_buffer |= b << (24 - j->code_bits);
1861 j->code_bits += 8;
1862 } while (j->code_bits <= 24);
1863 }
1864
1865 // (1 << n) - 1
1866 static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
1867
1868 // decode a jpeg huffman value from the bitstream
1869 stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
1870 {
1871 unsigned int temp;
1872 int c,k;
1873
1874 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1875
1876 // look at the top FAST_BITS and determine what symbol ID it is,
1877 // if the code is <= FAST_BITS
1878 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
1879 k = h->fast[c];
1880 if (k < 255) {
1881 int s = h->size[k];
1882 if (s > j->code_bits)
1883 return -1;
1884 j->code_buffer <<= s;
1885 j->code_bits -= s;
1886 return h->values[k];
1887 }
1888
1889 // naive test is to shift the code_buffer down so k bits are
1890 // valid, then test against maxcode. To speed this up, we've
1891 // preshifted maxcode left so that it has (16-k) 0s at the
1892 // end; in other words, regardless of the number of bits, it
1893 // wants to be compared against something shifted to have 16;
1894 // that way we don't need to shift inside the loop.
1895 temp = j->code_buffer >> 16;
1896 for (k=FAST_BITS+1 ; ; ++k)
1897 if (temp < h->maxcode[k])
1898 break;
1899 if (k == 17) {
1900 // error! code not found
1901 j->code_bits -= 16;
1902 return -1;
1903 }
1904
1905 if (k > j->code_bits)
1906 return -1;
1907
1908 // convert the huffman code to the symbol id
1909 c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
1910 STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
1911
1912 // convert the id to a symbol
1913 j->code_bits -= k;
1914 j->code_buffer <<= k;
1915 return h->values[c];
1916 }
1917
1918 // bias[n] = (-1<<n) + 1
1919 static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
1920
1921 // combined JPEG 'receive' and JPEG 'extend', since baseline
1922 // always extends everything it receives.
1923 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
1924 {
1925 unsigned int k;
1926 int sgn;
1927 if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1928
1929 sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB
1930 k = stbi_lrot(j->code_buffer, n);
1931 STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask)));
1932 j->code_buffer = k & ~stbi__bmask[n];
1933 k &= stbi__bmask[n];
1934 j->code_bits -= n;
1935 return k + (stbi__jbias[n] & ~sgn);
1936 }
1937
1938 // get some unsigned bits
1939 stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
1940 {
1941 unsigned int k;
1942 if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
1943 k = stbi_lrot(j->code_buffer, n);
1944 j->code_buffer = k & ~stbi__bmask[n];
1945 k &= stbi__bmask[n];
1946 j->code_bits -= n;
1947 return k;
1948 }
1949
1950 stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
1951 {
1952 unsigned int k;
1953 if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
1954 k = j->code_buffer;
1955 j->code_buffer <<= 1;
1956 --j->code_bits;
1957 return k & 0x80000000;
1958 }
1959
1960 // given a value that's at position X in the zigzag stream,
1961 // where does it appear in the 8x8 matrix coded as row-major?
1962 static stbi_uc stbi__jpeg_dezigzag[64+15] =
1963 {
1964 0, 1, 8, 16, 9, 2, 3, 10,
1965 17, 24, 32, 25, 18, 11, 4, 5,
1966 12, 19, 26, 33, 40, 48, 41, 34,
1967 27, 20, 13, 6, 7, 14, 21, 28,
1968 35, 42, 49, 56, 57, 50, 43, 36,
1969 29, 22, 15, 23, 30, 37, 44, 51,
1970 58, 59, 52, 45, 38, 31, 39, 46,
1971 53, 60, 61, 54, 47, 55, 62, 63,
1972 // let corrupt input sample past end
1973 63, 63, 63, 63, 63, 63, 63, 63,
1974 63, 63, 63, 63, 63, 63, 63
1975 };
1976
1977 // decode one 64-entry block--
1978 static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant)
1979 {
1980 int diff,dc,k;
1981 int t;
1982
1983 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
1984 t = stbi__jpeg_huff_decode(j, hdc);
1985 if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG");
1986
1987 // 0 all the ac values now so we can do it 32-bits at a time
1988 memset(data,0,64*sizeof(data[0]));
1989
1990 diff = t ? stbi__extend_receive(j, t) : 0;
1991 dc = j->img_comp[b].dc_pred + diff;
1992 j->img_comp[b].dc_pred = dc;
1993 data[0] = (short) (dc * dequant[0]);
1994
1995 // decode AC components, see JPEG spec
1996 k = 1;
1997 do {
1998 unsigned int zig;
1999 int c,r,s;
2000 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2001 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2002 r = fac[c];
2003 if (r) { // fast-AC path
2004 k += (r >> 4) & 15; // run
2005 s = r & 15; // combined length
2006 j->code_buffer <<= s;
2007 j->code_bits -= s;
2008 // decode into unzigzag'd location
2009 zig = stbi__jpeg_dezigzag[k++];
2010 data[zig] = (short) ((r >> 8) * dequant[zig]);
2011 } else {
2012 int rs = stbi__jpeg_huff_decode(j, hac);
2013 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2014 s = rs & 15;
2015 r = rs >> 4;
2016 if (s == 0) {
2017 if (rs != 0xf0) break; // end block
2018 k += 16;
2019 } else {
2020 k += r;
2021 // decode into unzigzag'd location
2022 zig = stbi__jpeg_dezigzag[k++];
2023 data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
2024 }
2025 }
2026 } while (k < 64);
2027 return 1;
2028 }
2029
2030 static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
2031 {
2032 int diff,dc;
2033 int t;
2034 if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2035
2036 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2037
2038 if (j->succ_high == 0) {
2039 // first scan for DC coefficient, must be first
2040 memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
2041 t = stbi__jpeg_huff_decode(j, hdc);
2042 diff = t ? stbi__extend_receive(j, t) : 0;
2043
2044 dc = j->img_comp[b].dc_pred + diff;
2045 j->img_comp[b].dc_pred = dc;
2046 data[0] = (short) (dc << j->succ_low);
2047 } else {
2048 // refinement scan for DC coefficient
2049 if (stbi__jpeg_get_bit(j))
2050 data[0] += (short) (1 << j->succ_low);
2051 }
2052 return 1;
2053 }
2054
2055 // @OPTIMIZE: store non-zigzagged during the decode passes,
2056 // and only de-zigzag when dequantizing
2057 static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
2058 {
2059 int k;
2060 if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
2061
2062 if (j->succ_high == 0) {
2063 int shift = j->succ_low;
2064
2065 if (j->eob_run) {
2066 --j->eob_run;
2067 return 1;
2068 }
2069
2070 k = j->spec_start;
2071 do {
2072 unsigned int zig;
2073 int c,r,s;
2074 if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
2075 c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
2076 r = fac[c];
2077 if (r) { // fast-AC path
2078 k += (r >> 4) & 15; // run
2079 s = r & 15; // combined length
2080 j->code_buffer <<= s;
2081 j->code_bits -= s;
2082 zig = stbi__jpeg_dezigzag[k++];
2083 data[zig] = (short) ((r >> 8) << shift);
2084 } else {
2085 int rs = stbi__jpeg_huff_decode(j, hac);
2086 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2087 s = rs & 15;
2088 r = rs >> 4;
2089 if (s == 0) {
2090 if (r < 15) {
2091 j->eob_run = (1 << r);
2092 if (r)
2093 j->eob_run += stbi__jpeg_get_bits(j, r);
2094 --j->eob_run;
2095 break;
2096 }
2097 k += 16;
2098 } else {
2099 k += r;
2100 zig = stbi__jpeg_dezigzag[k++];
2101 data[zig] = (short) (stbi__extend_receive(j,s) << shift);
2102 }
2103 }
2104 } while (k <= j->spec_end);
2105 } else {
2106 // refinement scan for these AC coefficients
2107
2108 short bit = (short) (1 << j->succ_low);
2109
2110 if (j->eob_run) {
2111 --j->eob_run;
2112 for (k = j->spec_start; k <= j->spec_end; ++k) {
2113 short *p = &data[stbi__jpeg_dezigzag[k]];
2114 if (*p != 0)
2115 if (stbi__jpeg_get_bit(j))
2116 if ((*p & bit)==0) {
2117 if (*p > 0)
2118 *p += bit;
2119 else
2120 *p -= bit;
2121 }
2122 }
2123 } else {
2124 k = j->spec_start;
2125 do {
2126 int r,s;
2127 int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
2128 if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
2129 s = rs & 15;
2130 r = rs >> 4;
2131 if (s == 0) {
2132 if (r < 15) {
2133 j->eob_run = (1 << r) - 1;
2134 if (r)
2135 j->eob_run += stbi__jpeg_get_bits(j, r);
2136 r = 64; // force end of block
2137 } else {
2138 // r=15 s=0 should write 16 0s, so we just do
2139 // a run of 15 0s and then write s (which is 0),
2140 // so we don't have to do anything special here
2141 }
2142 } else {
2143 if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
2144 // sign bit
2145 if (stbi__jpeg_get_bit(j))
2146 s = bit;
2147 else
2148 s = -bit;
2149 }
2150
2151 // advance by r
2152 while (k <= j->spec_end) {
2153 short *p = &data[stbi__jpeg_dezigzag[k++]];
2154 if (*p != 0) {
2155 if (stbi__jpeg_get_bit(j))
2156 if ((*p & bit)==0) {
2157 if (*p > 0)
2158 *p += bit;
2159 else
2160 *p -= bit;
2161 }
2162 } else {
2163 if (r == 0) {
2164 *p = (short) s;
2165 break;
2166 }
2167 --r;
2168 }
2169 }
2170 } while (k <= j->spec_end);
2171 }
2172 }
2173 return 1;
2174 }
2175
2176 // take a -128..127 value and stbi__clamp it and convert to 0..255
2177 stbi_inline static stbi_uc stbi__clamp(int x)
2178 {
2179 // trick to use a single test to catch both cases
2180 if ((unsigned int) x > 255) {
2181 if (x < 0) return 0;
2182 if (x > 255) return 255;
2183 }
2184 return (stbi_uc) x;
2185 }
2186
2187 #define stbi__f2f(x) ((int) (((x) * 4096 + 0.5)))
2188 #define stbi__fsh(x) ((x) << 12)
2189
2190 // derived from jidctint -- DCT_ISLOW
2191 #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
2192 int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
2193 p2 = s2; \
2194 p3 = s6; \
2195 p1 = (p2+p3) * stbi__f2f(0.5411961f); \
2196 t2 = p1 + p3*stbi__f2f(-1.847759065f); \
2197 t3 = p1 + p2*stbi__f2f( 0.765366865f); \
2198 p2 = s0; \
2199 p3 = s4; \
2200 t0 = stbi__fsh(p2+p3); \
2201 t1 = stbi__fsh(p2-p3); \
2202 x0 = t0+t3; \
2203 x3 = t0-t3; \
2204 x1 = t1+t2; \
2205 x2 = t1-t2; \
2206 t0 = s7; \
2207 t1 = s5; \
2208 t2 = s3; \
2209 t3 = s1; \
2210 p3 = t0+t2; \
2211 p4 = t1+t3; \
2212 p1 = t0+t3; \
2213 p2 = t1+t2; \
2214 p5 = (p3+p4)*stbi__f2f( 1.175875602f); \
2215 t0 = t0*stbi__f2f( 0.298631336f); \
2216 t1 = t1*stbi__f2f( 2.053119869f); \
2217 t2 = t2*stbi__f2f( 3.072711026f); \
2218 t3 = t3*stbi__f2f( 1.501321110f); \
2219 p1 = p5 + p1*stbi__f2f(-0.899976223f); \
2220 p2 = p5 + p2*stbi__f2f(-2.562915447f); \
2221 p3 = p3*stbi__f2f(-1.961570560f); \
2222 p4 = p4*stbi__f2f(-0.390180644f); \
2223 t3 += p1+p4; \
2224 t2 += p2+p3; \
2225 t1 += p2+p4; \
2226 t0 += p1+p3;
2227
2228 static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
2229 {
2230 int i,val[64],*v=val;
2231 stbi_uc *o;
2232 short *d = data;
2233
2234 // columns
2235 for (i=0; i < 8; ++i,++d, ++v) {
2236 // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
2237 if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
2238 && d[40]==0 && d[48]==0 && d[56]==0) {
2239 // no shortcut 0 seconds
2240 // (1|2|3|4|5|6|7)==0 0 seconds
2241 // all separate -0.047 seconds
2242 // 1 && 2|3 && 4|5 && 6|7: -0.047 seconds
2243 int dcterm = d[0] << 2;
2244 v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
2245 } else {
2246 STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
2247 // constants scaled things up by 1<<12; let's bring them back
2248 // down, but keep 2 extra bits of precision
2249 x0 += 512; x1 += 512; x2 += 512; x3 += 512;
2250 v[ 0] = (x0+t3) >> 10;
2251 v[56] = (x0-t3) >> 10;
2252 v[ 8] = (x1+t2) >> 10;
2253 v[48] = (x1-t2) >> 10;
2254 v[16] = (x2+t1) >> 10;
2255 v[40] = (x2-t1) >> 10;
2256 v[24] = (x3+t0) >> 10;
2257 v[32] = (x3-t0) >> 10;
2258 }
2259 }
2260
2261 for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
2262 // no fast case since the first 1D IDCT spread components out
2263 STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
2264 // constants scaled things up by 1<<12, plus we had 1<<2 from first
2265 // loop, plus horizontal and vertical each scale by sqrt(8) so together
2266 // we've got an extra 1<<3, so 1<<17 total we need to remove.
2267 // so we want to round that, which means adding 0.5 * 1<<17,
2268 // aka 65536. Also, we'll end up with -128 to 127 that we want
2269 // to encode as 0..255 by adding 128, so we'll add that before the shift
2270 x0 += 65536 + (128<<17);
2271 x1 += 65536 + (128<<17);
2272 x2 += 65536 + (128<<17);
2273 x3 += 65536 + (128<<17);
2274 // tried computing the shifts into temps, or'ing the temps to see
2275 // if any were out of range, but that was slower
2276 o[0] = stbi__clamp((x0+t3) >> 17);
2277 o[7] = stbi__clamp((x0-t3) >> 17);
2278 o[1] = stbi__clamp((x1+t2) >> 17);
2279 o[6] = stbi__clamp((x1-t2) >> 17);
2280 o[2] = stbi__clamp((x2+t1) >> 17);
2281 o[5] = stbi__clamp((x2-t1) >> 17);
2282 o[3] = stbi__clamp((x3+t0) >> 17);
2283 o[4] = stbi__clamp((x3-t0) >> 17);
2284 }
2285 }
2286
2287 #ifdef STBI_SSE2
2288 // sse2 integer IDCT. not the fastest possible implementation but it
2289 // produces bit-identical results to the generic C version so it's
2290 // fully "transparent".
2291 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2292 {
2293 // This is constructed to match our regular (generic) integer IDCT exactly.
2294 __m128i row0, row1, row2, row3, row4, row5, row6, row7;
2295 __m128i tmp;
2296
2297 // dot product constant: even elems=x, odd elems=y
2298 #define dct_const(x,y) _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
2299
2300 // out(0) = c0[even]*x + c0[odd]*y (c0, x, y 16-bit, out 32-bit)
2301 // out(1) = c1[even]*x + c1[odd]*y
2302 #define dct_rot(out0,out1, x,y,c0,c1) \
2303 __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
2304 __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
2305 __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
2306 __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
2307 __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
2308 __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
2309
2310 // out = in << 12 (in 16-bit, out 32-bit)
2311 #define dct_widen(out, in) \
2312 __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
2313 __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
2314
2315 // wide add
2316 #define dct_wadd(out, a, b) \
2317 __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
2318 __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
2319
2320 // wide sub
2321 #define dct_wsub(out, a, b) \
2322 __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
2323 __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
2324
2325 // butterfly a/b, add bias, then shift by "s" and pack
2326 #define dct_bfly32o(out0, out1, a,b,bias,s) \
2327 { \
2328 __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
2329 __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
2330 dct_wadd(sum, abiased, b); \
2331 dct_wsub(dif, abiased, b); \
2332 out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
2333 out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
2334 }
2335
2336 // 8-bit interleave step (for transposes)
2337 #define dct_interleave8(a, b) \
2338 tmp = a; \
2339 a = _mm_unpacklo_epi8(a, b); \
2340 b = _mm_unpackhi_epi8(tmp, b)
2341
2342 // 16-bit interleave step (for transposes)
2343 #define dct_interleave16(a, b) \
2344 tmp = a; \
2345 a = _mm_unpacklo_epi16(a, b); \
2346 b = _mm_unpackhi_epi16(tmp, b)
2347
2348 #define dct_pass(bias,shift) \
2349 { \
2350 /* even part */ \
2351 dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
2352 __m128i sum04 = _mm_add_epi16(row0, row4); \
2353 __m128i dif04 = _mm_sub_epi16(row0, row4); \
2354 dct_widen(t0e, sum04); \
2355 dct_widen(t1e, dif04); \
2356 dct_wadd(x0, t0e, t3e); \
2357 dct_wsub(x3, t0e, t3e); \
2358 dct_wadd(x1, t1e, t2e); \
2359 dct_wsub(x2, t1e, t2e); \
2360 /* odd part */ \
2361 dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
2362 dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
2363 __m128i sum17 = _mm_add_epi16(row1, row7); \
2364 __m128i sum35 = _mm_add_epi16(row3, row5); \
2365 dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
2366 dct_wadd(x4, y0o, y4o); \
2367 dct_wadd(x5, y1o, y5o); \
2368 dct_wadd(x6, y2o, y5o); \
2369 dct_wadd(x7, y3o, y4o); \
2370 dct_bfly32o(row0,row7, x0,x7,bias,shift); \
2371 dct_bfly32o(row1,row6, x1,x6,bias,shift); \
2372 dct_bfly32o(row2,row5, x2,x5,bias,shift); \
2373 dct_bfly32o(row3,row4, x3,x4,bias,shift); \
2374 }
2375
2376 __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
2377 __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
2378 __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
2379 __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
2380 __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
2381 __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
2382 __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
2383 __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
2384
2385 // rounding biases in column/row passes, see stbi__idct_block for explanation.
2386 __m128i bias_0 = _mm_set1_epi32(512);
2387 __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
2388
2389 // load
2390 row0 = _mm_load_si128((const __m128i *) (data + 0*8));
2391 row1 = _mm_load_si128((const __m128i *) (data + 1*8));
2392 row2 = _mm_load_si128((const __m128i *) (data + 2*8));
2393 row3 = _mm_load_si128((const __m128i *) (data + 3*8));
2394 row4 = _mm_load_si128((const __m128i *) (data + 4*8));
2395 row5 = _mm_load_si128((const __m128i *) (data + 5*8));
2396 row6 = _mm_load_si128((const __m128i *) (data + 6*8));
2397 row7 = _mm_load_si128((const __m128i *) (data + 7*8));
2398
2399 // column pass
2400 dct_pass(bias_0, 10);
2401
2402 {
2403 // 16bit 8x8 transpose pass 1
2404 dct_interleave16(row0, row4);
2405 dct_interleave16(row1, row5);
2406 dct_interleave16(row2, row6);
2407 dct_interleave16(row3, row7);
2408
2409 // transpose pass 2
2410 dct_interleave16(row0, row2);
2411 dct_interleave16(row1, row3);
2412 dct_interleave16(row4, row6);
2413 dct_interleave16(row5, row7);
2414
2415 // transpose pass 3
2416 dct_interleave16(row0, row1);
2417 dct_interleave16(row2, row3);
2418 dct_interleave16(row4, row5);
2419 dct_interleave16(row6, row7);
2420 }
2421
2422 // row pass
2423 dct_pass(bias_1, 17);
2424
2425 {
2426 // pack
2427 __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
2428 __m128i p1 = _mm_packus_epi16(row2, row3);
2429 __m128i p2 = _mm_packus_epi16(row4, row5);
2430 __m128i p3 = _mm_packus_epi16(row6, row7);
2431
2432 // 8bit 8x8 transpose pass 1
2433 dct_interleave8(p0, p2); // a0e0a1e1...
2434 dct_interleave8(p1, p3); // c0g0c1g1...
2435
2436 // transpose pass 2
2437 dct_interleave8(p0, p1); // a0c0e0g0...
2438 dct_interleave8(p2, p3); // b0d0f0h0...
2439
2440 // transpose pass 3
2441 dct_interleave8(p0, p2); // a0b0c0d0...
2442 dct_interleave8(p1, p3); // a4b4c4d4...
2443
2444 // store
2445 _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
2446 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
2447 _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
2448 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
2449 _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
2450 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
2451 _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
2452 _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
2453 }
2454
2455 #undef dct_const
2456 #undef dct_rot
2457 #undef dct_widen
2458 #undef dct_wadd
2459 #undef dct_wsub
2460 #undef dct_bfly32o
2461 #undef dct_interleave8
2462 #undef dct_interleave16
2463 #undef dct_pass
2464 }
2465
2466 #endif // STBI_SSE2
2467
2468 #ifdef STBI_NEON
2469
2470 // NEON integer IDCT. should produce bit-identical
2471 // results to the generic C version.
2472 static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
2473 {
2474 int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
2475
2476 int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
2477 int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
2478 int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
2479 int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
2480 int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
2481 int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
2482 int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
2483 int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
2484 int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
2485 int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
2486 int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
2487 int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
2488
2489 #define dct_long_mul(out, inq, coeff) \
2490 int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
2491 int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
2492
2493 #define dct_long_mac(out, acc, inq, coeff) \
2494 int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
2495 int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
2496
2497 #define dct_widen(out, inq) \
2498 int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
2499 int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
2500
2501 // wide add
2502 #define dct_wadd(out, a, b) \
2503 int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
2504 int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
2505
2506 // wide sub
2507 #define dct_wsub(out, a, b) \
2508 int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
2509 int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
2510
2511 // butterfly a/b, then shift using "shiftop" by "s" and pack
2512 #define dct_bfly32o(out0,out1, a,b,shiftop,s) \
2513 { \
2514 dct_wadd(sum, a, b); \
2515 dct_wsub(dif, a, b); \
2516 out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
2517 out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
2518 }
2519
2520 #define dct_pass(shiftop, shift) \
2521 { \
2522 /* even part */ \
2523 int16x8_t sum26 = vaddq_s16(row2, row6); \
2524 dct_long_mul(p1e, sum26, rot0_0); \
2525 dct_long_mac(t2e, p1e, row6, rot0_1); \
2526 dct_long_mac(t3e, p1e, row2, rot0_2); \
2527 int16x8_t sum04 = vaddq_s16(row0, row4); \
2528 int16x8_t dif04 = vsubq_s16(row0, row4); \
2529 dct_widen(t0e, sum04); \
2530 dct_widen(t1e, dif04); \
2531 dct_wadd(x0, t0e, t3e); \
2532 dct_wsub(x3, t0e, t3e); \
2533 dct_wadd(x1, t1e, t2e); \
2534 dct_wsub(x2, t1e, t2e); \
2535 /* odd part */ \
2536 int16x8_t sum15 = vaddq_s16(row1, row5); \
2537 int16x8_t sum17 = vaddq_s16(row1, row7); \
2538 int16x8_t sum35 = vaddq_s16(row3, row5); \
2539 int16x8_t sum37 = vaddq_s16(row3, row7); \
2540 int16x8_t sumodd = vaddq_s16(sum17, sum35); \
2541 dct_long_mul(p5o, sumodd, rot1_0); \
2542 dct_long_mac(p1o, p5o, sum17, rot1_1); \
2543 dct_long_mac(p2o, p5o, sum35, rot1_2); \
2544 dct_long_mul(p3o, sum37, rot2_0); \
2545 dct_long_mul(p4o, sum15, rot2_1); \
2546 dct_wadd(sump13o, p1o, p3o); \
2547 dct_wadd(sump24o, p2o, p4o); \
2548 dct_wadd(sump23o, p2o, p3o); \
2549 dct_wadd(sump14o, p1o, p4o); \
2550 dct_long_mac(x4, sump13o, row7, rot3_0); \
2551 dct_long_mac(x5, sump24o, row5, rot3_1); \
2552 dct_long_mac(x6, sump23o, row3, rot3_2); \
2553 dct_long_mac(x7, sump14o, row1, rot3_3); \
2554 dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
2555 dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
2556 dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
2557 dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
2558 }
2559
2560 // load
2561 row0 = vld1q_s16(data + 0*8);
2562 row1 = vld1q_s16(data + 1*8);
2563 row2 = vld1q_s16(data + 2*8);
2564 row3 = vld1q_s16(data + 3*8);
2565 row4 = vld1q_s16(data + 4*8);
2566 row5 = vld1q_s16(data + 5*8);
2567 row6 = vld1q_s16(data + 6*8);
2568 row7 = vld1q_s16(data + 7*8);
2569
2570 // add DC bias
2571 row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
2572
2573 // column pass
2574 dct_pass(vrshrn_n_s32, 10);
2575
2576 // 16bit 8x8 transpose
2577 {
2578 // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
2579 // whether compilers actually get this is another story, sadly.
2580 #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
2581 #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
2582 #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
2583
2584 // pass 1
2585 dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
2586 dct_trn16(row2, row3);
2587 dct_trn16(row4, row5);
2588 dct_trn16(row6, row7);
2589
2590 // pass 2
2591 dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
2592 dct_trn32(row1, row3);
2593 dct_trn32(row4, row6);
2594 dct_trn32(row5, row7);
2595
2596 // pass 3
2597 dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
2598 dct_trn64(row1, row5);
2599 dct_trn64(row2, row6);
2600 dct_trn64(row3, row7);
2601
2602 #undef dct_trn16
2603 #undef dct_trn32
2604 #undef dct_trn64
2605 }
2606
2607 // row pass
2608 // vrshrn_n_s32 only supports shifts up to 16, we need
2609 // 17. so do a non-rounding shift of 16 first then follow
2610 // up with a rounding shift by 1.
2611 dct_pass(vshrn_n_s32, 16);
2612
2613 {
2614 // pack and round
2615 uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
2616 uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
2617 uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
2618 uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
2619 uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
2620 uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
2621 uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
2622 uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
2623
2624 // again, these can translate into one instruction, but often don't.
2625 #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
2626 #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
2627 #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
2628
2629 // sadly can't use interleaved stores here since we only write
2630 // 8 bytes to each scan line!
2631
2632 // 8x8 8-bit transpose pass 1
2633 dct_trn8_8(p0, p1);
2634 dct_trn8_8(p2, p3);
2635 dct_trn8_8(p4, p5);
2636 dct_trn8_8(p6, p7);
2637
2638 // pass 2
2639 dct_trn8_16(p0, p2);
2640 dct_trn8_16(p1, p3);
2641 dct_trn8_16(p4, p6);
2642 dct_trn8_16(p5, p7);
2643
2644 // pass 3
2645 dct_trn8_32(p0, p4);
2646 dct_trn8_32(p1, p5);
2647 dct_trn8_32(p2, p6);
2648 dct_trn8_32(p3, p7);
2649
2650 // store
2651 vst1_u8(out, p0); out += out_stride;
2652 vst1_u8(out, p1); out += out_stride;
2653 vst1_u8(out, p2); out += out_stride;
2654 vst1_u8(out, p3); out += out_stride;
2655 vst1_u8(out, p4); out += out_stride;
2656 vst1_u8(out, p5); out += out_stride;
2657 vst1_u8(out, p6); out += out_stride;
2658 vst1_u8(out, p7);
2659
2660 #undef dct_trn8_8
2661 #undef dct_trn8_16
2662 #undef dct_trn8_32
2663 }
2664
2665 #undef dct_long_mul
2666 #undef dct_long_mac
2667 #undef dct_widen
2668 #undef dct_wadd
2669 #undef dct_wsub
2670 #undef dct_bfly32o
2671 #undef dct_pass
2672 }
2673
2674 #endif // STBI_NEON
2675
2676 #define STBI__MARKER_none 0xff
2677 // if there's a pending marker from the entropy stream, return that
2678 // otherwise, fetch from the stream and get a marker. if there's no
2679 // marker, return 0xff, which is never a valid marker value
2680 static stbi_uc stbi__get_marker(stbi__jpeg *j)
2681 {
2682 stbi_uc x;
2683 if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
2684 x = stbi__get8(j->s);
2685 if (x != 0xff) return STBI__MARKER_none;
2686 while (x == 0xff)
2687 x = stbi__get8(j->s);
2688 return x;
2689 }
2690
2691 // in each scan, we'll have scan_n components, and the order
2692 // of the components is specified by order[]
2693 #define STBI__RESTART(x) ((x) >= 0xd0 && (x) <= 0xd7)
2694
2695 // after a restart interval, stbi__jpeg_reset the entropy decoder and
2696 // the dc prediction
2697 static void stbi__jpeg_reset(stbi__jpeg *j)
2698 {
2699 j->code_bits = 0;
2700 j->code_buffer = 0;
2701 j->nomore = 0;
2702 j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0;
2703 j->marker = STBI__MARKER_none;
2704 j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
2705 j->eob_run = 0;
2706 // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
2707 // since we don't even allow 1<<30 pixels
2708 }
2709
2710 static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
2711 {
2712 stbi__jpeg_reset(z);
2713 if (!z->progressive) {
2714 if (z->scan_n == 1) {
2715 int i,j;
2716 STBI_SIMD_ALIGN(short, data[64]);
2717 int n = z->order[0];
2718 // non-interleaved data, we just need to process one block at a time,
2719 // in trivial scanline order
2720 // number of blocks to do just depends on how many actual "pixels" this
2721 // component has, independent of interleaved MCU blocking and such
2722 int w = (z->img_comp[n].x+7) >> 3;
2723 int h = (z->img_comp[n].y+7) >> 3;
2724 for (j=0; j < h; ++j) {
2725 for (i=0; i < w; ++i) {
2726 int ha = z->img_comp[n].ha;
2727 if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2728 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2729 // every data block is an MCU, so countdown the restart interval
2730 if (--z->todo <= 0) {
2731 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2732 // if it's NOT a restart, then just bail, so we get corrupt data
2733 // rather than no data
2734 if (!STBI__RESTART(z->marker)) return 1;
2735 stbi__jpeg_reset(z);
2736 }
2737 }
2738 }
2739 return 1;
2740 } else { // interleaved
2741 int i,j,k,x,y;
2742 STBI_SIMD_ALIGN(short, data[64]);
2743 for (j=0; j < z->img_mcu_y; ++j) {
2744 for (i=0; i < z->img_mcu_x; ++i) {
2745 // scan an interleaved mcu... process scan_n components in order
2746 for (k=0; k < z->scan_n; ++k) {
2747 int n = z->order[k];
2748 // scan out an mcu's worth of this component; that's just determined
2749 // by the basic H and V specified for the component
2750 for (y=0; y < z->img_comp[n].v; ++y) {
2751 for (x=0; x < z->img_comp[n].h; ++x) {
2752 int x2 = (i*z->img_comp[n].h + x)*8;
2753 int y2 = (j*z->img_comp[n].v + y)*8;
2754 int ha = z->img_comp[n].ha;
2755 if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
2756 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
2757 }
2758 }
2759 }
2760 // after all interleaved components, that's an interleaved MCU,
2761 // so now count down the restart interval
2762 if (--z->todo <= 0) {
2763 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2764 if (!STBI__RESTART(z->marker)) return 1;
2765 stbi__jpeg_reset(z);
2766 }
2767 }
2768 }
2769 return 1;
2770 }
2771 } else {
2772 if (z->scan_n == 1) {
2773 int i,j;
2774 int n = z->order[0];
2775 // non-interleaved data, we just need to process one block at a time,
2776 // in trivial scanline order
2777 // number of blocks to do just depends on how many actual "pixels" this
2778 // component has, independent of interleaved MCU blocking and such
2779 int w = (z->img_comp[n].x+7) >> 3;
2780 int h = (z->img_comp[n].y+7) >> 3;
2781 for (j=0; j < h; ++j) {
2782 for (i=0; i < w; ++i) {
2783 short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2784 if (z->spec_start == 0) {
2785 if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2786 return 0;
2787 } else {
2788 int ha = z->img_comp[n].ha;
2789 if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
2790 return 0;
2791 }
2792 // every data block is an MCU, so countdown the restart interval
2793 if (--z->todo <= 0) {
2794 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2795 if (!STBI__RESTART(z->marker)) return 1;
2796 stbi__jpeg_reset(z);
2797 }
2798 }
2799 }
2800 return 1;
2801 } else { // interleaved
2802 int i,j,k,x,y;
2803 for (j=0; j < z->img_mcu_y; ++j) {
2804 for (i=0; i < z->img_mcu_x; ++i) {
2805 // scan an interleaved mcu... process scan_n components in order
2806 for (k=0; k < z->scan_n; ++k) {
2807 int n = z->order[k];
2808 // scan out an mcu's worth of this component; that's just determined
2809 // by the basic H and V specified for the component
2810 for (y=0; y < z->img_comp[n].v; ++y) {
2811 for (x=0; x < z->img_comp[n].h; ++x) {
2812 int x2 = (i*z->img_comp[n].h + x);
2813 int y2 = (j*z->img_comp[n].v + y);
2814 short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
2815 if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
2816 return 0;
2817 }
2818 }
2819 }
2820 // after all interleaved components, that's an interleaved MCU,
2821 // so now count down the restart interval
2822 if (--z->todo <= 0) {
2823 if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
2824 if (!STBI__RESTART(z->marker)) return 1;
2825 stbi__jpeg_reset(z);
2826 }
2827 }
2828 }
2829 return 1;
2830 }
2831 }
2832 }
2833
2834 static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant)
2835 {
2836 int i;
2837 for (i=0; i < 64; ++i)
2838 data[i] *= dequant[i];
2839 }
2840
2841 static void stbi__jpeg_finish(stbi__jpeg *z)
2842 {
2843 if (z->progressive) {
2844 // dequantize and idct the data
2845 int i,j,n;
2846 for (n=0; n < z->s->img_n; ++n) {
2847 int w = (z->img_comp[n].x+7) >> 3;
2848 int h = (z->img_comp[n].y+7) >> 3;
2849 for (j=0; j < h; ++j) {
2850 for (i=0; i < w; ++i) {
2851 short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
2852 stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
2853 z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
2854 }
2855 }
2856 }
2857 }
2858 }
2859
2860 static int stbi__process_marker(stbi__jpeg *z, int m)
2861 {
2862 int L;
2863 switch (m) {
2864 case STBI__MARKER_none: // no marker found
2865 return stbi__err("expected marker","Corrupt JPEG");
2866
2867 case 0xDD: // DRI - specify restart interval
2868 if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
2869 z->restart_interval = stbi__get16be(z->s);
2870 return 1;
2871
2872 case 0xDB: // DQT - define quantization table
2873 L = stbi__get16be(z->s)-2;
2874 while (L > 0) {
2875 int q = stbi__get8(z->s);
2876 int p = q >> 4;
2877 int t = q & 15,i;
2878 if (p != 0) return stbi__err("bad DQT type","Corrupt JPEG");
2879 if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
2880 for (i=0; i < 64; ++i)
2881 z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s);
2882 L -= 65;
2883 }
2884 return L==0;
2885
2886 case 0xC4: // DHT - define huffman table
2887 L = stbi__get16be(z->s)-2;
2888 while (L > 0) {
2889 stbi_uc *v;
2890 int sizes[16],i,n=0;
2891 int q = stbi__get8(z->s);
2892 int tc = q >> 4;
2893 int th = q & 15;
2894 if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
2895 for (i=0; i < 16; ++i) {
2896 sizes[i] = stbi__get8(z->s);
2897 n += sizes[i];
2898 }
2899 L -= 17;
2900 if (tc == 0) {
2901 if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
2902 v = z->huff_dc[th].values;
2903 } else {
2904 if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
2905 v = z->huff_ac[th].values;
2906 }
2907 for (i=0; i < n; ++i)
2908 v[i] = stbi__get8(z->s);
2909 if (tc != 0)
2910 stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
2911 L -= n;
2912 }
2913 return L==0;
2914 }
2915 // check for comment block or APP blocks
2916 if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
2917 stbi__skip(z->s, stbi__get16be(z->s)-2);
2918 return 1;
2919 }
2920 return 0;
2921 }
2922
2923 // after we see SOS
2924 static int stbi__process_scan_header(stbi__jpeg *z)
2925 {
2926 int i;
2927 int Ls = stbi__get16be(z->s);
2928 z->scan_n = stbi__get8(z->s);
2929 if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
2930 if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
2931 for (i=0; i < z->scan_n; ++i) {
2932 int id = stbi__get8(z->s), which;
2933 int q = stbi__get8(z->s);
2934 for (which = 0; which < z->s->img_n; ++which)
2935 if (z->img_comp[which].id == id)
2936 break;
2937 if (which == z->s->img_n) return 0; // no match
2938 z->img_comp[which].hd = q >> 4; if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
2939 z->img_comp[which].ha = q & 15; if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
2940 z->order[i] = which;
2941 }
2942
2943 {
2944 int aa;
2945 z->spec_start = stbi__get8(z->s);
2946 z->spec_end = stbi__get8(z->s); // should be 63, but might be 0
2947 aa = stbi__get8(z->s);
2948 z->succ_high = (aa >> 4);
2949 z->succ_low = (aa & 15);
2950 if (z->progressive) {
2951 if (z->spec_start > 63 || z->spec_end > 63 || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
2952 return stbi__err("bad SOS", "Corrupt JPEG");
2953 } else {
2954 if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
2955 if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
2956 z->spec_end = 63;
2957 }
2958 }
2959
2960 return 1;
2961 }
2962
2963 static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
2964 {
2965 int i;
2966 for (i=0; i < ncomp; ++i) {
2967 if (z->img_comp[i].raw_data) {
2968 STBI_FREE(z->img_comp[i].raw_data);
2969 z->img_comp[i].raw_data = NULL;
2970 z->img_comp[i].data = NULL;
2971 }
2972 if (z->img_comp[i].raw_coeff) {
2973 STBI_FREE(z->img_comp[i].raw_coeff);
2974 z->img_comp[i].raw_coeff = 0;
2975 z->img_comp[i].coeff = 0;
2976 }
2977 if (z->img_comp[i].linebuf) {
2978 STBI_FREE(z->img_comp[i].linebuf);
2979 z->img_comp[i].linebuf = NULL;
2980 }
2981 }
2982 return why;
2983 }
2984
2985 static int stbi__process_frame_header(stbi__jpeg *z, int scan)
2986 {
2987 stbi__context *s = z->s;
2988 int Lf,p,i,q, h_max=1,v_max=1,c;
2989 Lf = stbi__get16be(s); if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
2990 p = stbi__get8(s); if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
2991 s->img_y = stbi__get16be(s); if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
2992 s->img_x = stbi__get16be(s); if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
2993 c = stbi__get8(s);
2994 if (c != 3 && c != 1) return stbi__err("bad component count","Corrupt JPEG"); // JFIF requires
2995 s->img_n = c;
2996 for (i=0; i < c; ++i) {
2997 z->img_comp[i].data = NULL;
2998 z->img_comp[i].linebuf = NULL;
2999 }
3000
3001 if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
3002
3003 z->rgb = 0;
3004 for (i=0; i < s->img_n; ++i) {
3005 static unsigned char rgb[3] = { 'R', 'G', 'B' };
3006 z->img_comp[i].id = stbi__get8(s);
3007 if (z->img_comp[i].id != i+1) // JFIF requires
3008 if (z->img_comp[i].id != i) { // some version of jpegtran outputs non-JFIF-compliant files!
3009 // somethings output this (see http://fileformats.archiveteam.org/wiki/JPEG#Color_format)
3010 if (z->img_comp[i].id != rgb[i])
3011 return stbi__err("bad component ID","Corrupt JPEG");
3012 ++z->rgb;
3013 }
3014 q = stbi__get8(s);
3015 z->img_comp[i].h = (q >> 4); if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
3016 z->img_comp[i].v = q & 15; if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
3017 z->img_comp[i].tq = stbi__get8(s); if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
3018 }
3019
3020 if (scan != STBI__SCAN_load) return 1;
3021
3022 if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
3023
3024 for (i=0; i < s->img_n; ++i) {
3025 if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
3026 if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
3027 }
3028
3029 // compute interleaved mcu info
3030 z->img_h_max = h_max;
3031 z->img_v_max = v_max;
3032 z->img_mcu_w = h_max * 8;
3033 z->img_mcu_h = v_max * 8;
3034 // these sizes can't be more than 17 bits
3035 z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
3036 z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
3037
3038 for (i=0; i < s->img_n; ++i) {
3039 // number of effective pixels (e.g. for non-interleaved MCU)
3040 z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
3041 z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
3042 // to simplify generation, we'll allocate enough memory to decode
3043 // the bogus oversized data from using interleaved MCUs and their
3044 // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
3045 // discard the extra data until colorspace conversion
3046 //
3047 // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
3048 // so these muls can't overflow with 32-bit ints (which we require)
3049 z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
3050 z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
3051 z->img_comp[i].coeff = 0;
3052 z->img_comp[i].raw_coeff = 0;
3053 z->img_comp[i].linebuf = NULL;
3054 z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
3055 if (z->img_comp[i].raw_data == NULL)
3056 return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3057 // align blocks for idct using mmx/sse
3058 z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
3059 if (z->progressive) {
3060 // w2, h2 are multiples of 8 (see above)
3061 z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
3062 z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
3063 z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
3064 if (z->img_comp[i].raw_coeff == NULL)
3065 return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
3066 z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
3067 }
3068 }
3069
3070 return 1;
3071 }
3072
3073 // use comparisons since in some cases we handle more than one case (e.g. SOF)
3074 #define stbi__DNL(x) ((x) == 0xdc)
3075 #define stbi__SOI(x) ((x) == 0xd8)
3076 #define stbi__EOI(x) ((x) == 0xd9)
3077 #define stbi__SOF(x) ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
3078 #define stbi__SOS(x) ((x) == 0xda)
3079
3080 #define stbi__SOF_progressive(x) ((x) == 0xc2)
3081
3082 static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
3083 {
3084 int m;
3085 z->marker = STBI__MARKER_none; // initialize cached marker to empty
3086 m = stbi__get_marker(z);
3087 if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
3088 if (scan == STBI__SCAN_type) return 1;
3089 m = stbi__get_marker(z);
3090 while (!stbi__SOF(m)) {
3091 if (!stbi__process_marker(z,m)) return 0;
3092 m = stbi__get_marker(z);
3093 while (m == STBI__MARKER_none) {
3094 // some files have extra padding after their blocks, so ok, we'll scan
3095 if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
3096 m = stbi__get_marker(z);
3097 }
3098 }
3099 z->progressive = stbi__SOF_progressive(m);
3100 if (!stbi__process_frame_header(z, scan)) return 0;
3101 return 1;
3102 }
3103
3104 // decode image to YCbCr format
3105 static int stbi__decode_jpeg_image(stbi__jpeg *j)
3106 {
3107 int m;
3108 for (m = 0; m < 4; m++) {
3109 j->img_comp[m].raw_data = NULL;
3110 j->img_comp[m].raw_coeff = NULL;
3111 }
3112 j->restart_interval = 0;
3113 if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
3114 m = stbi__get_marker(j);
3115 while (!stbi__EOI(m)) {
3116 if (stbi__SOS(m)) {
3117 if (!stbi__process_scan_header(j)) return 0;
3118 if (!stbi__parse_entropy_coded_data(j)) return 0;
3119 if (j->marker == STBI__MARKER_none ) {
3120 // handle 0s at the end of image data from IP Kamera 9060
3121 while (!stbi__at_eof(j->s)) {
3122 int x = stbi__get8(j->s);
3123 if (x == 255) {
3124 j->marker = stbi__get8(j->s);
3125 break;
3126 } else if (x != 0) {
3127 return stbi__err("junk before marker", "Corrupt JPEG");
3128 }
3129 }
3130 // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
3131 }
3132 } else {
3133 if (!stbi__process_marker(j, m)) return 0;
3134 }
3135 m = stbi__get_marker(j);
3136 }
3137 if (j->progressive)
3138 stbi__jpeg_finish(j);
3139 return 1;
3140 }
3141
3142 // static jfif-centered resampling (across block boundaries)
3143
3144 typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
3145 int w, int hs);
3146
3147 #define stbi__div4(x) ((stbi_uc) ((x) >> 2))
3148
3149 static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3150 {
3151 STBI_NOTUSED(out);
3152 STBI_NOTUSED(in_far);
3153 STBI_NOTUSED(w);
3154 STBI_NOTUSED(hs);
3155 return in_near;
3156 }
3157
3158 static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3159 {
3160 // need to generate two samples vertically for every one in input
3161 int i;
3162 STBI_NOTUSED(hs);
3163 for (i=0; i < w; ++i)
3164 out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
3165 return out;
3166 }
3167
3168 static stbi_uc* stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3169 {
3170 // need to generate two samples horizontally for every one in input
3171 int i;
3172 stbi_uc *input = in_near;
3173
3174 if (w == 1) {
3175 // if only one sample, can't do any interpolation
3176 out[0] = out[1] = input[0];
3177 return out;
3178 }
3179
3180 out[0] = input[0];
3181 out[1] = stbi__div4(input[0]*3 + input[1] + 2);
3182 for (i=1; i < w-1; ++i) {
3183 int n = 3*input[i]+2;
3184 out[i*2+0] = stbi__div4(n+input[i-1]);
3185 out[i*2+1] = stbi__div4(n+input[i+1]);
3186 }
3187 out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
3188 out[i*2+1] = input[w-1];
3189
3190 STBI_NOTUSED(in_far);
3191 STBI_NOTUSED(hs);
3192
3193 return out;
3194 }
3195
3196 #define stbi__div16(x) ((stbi_uc) ((x) >> 4))
3197
3198 static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3199 {
3200 // need to generate 2x2 samples for every one in input
3201 int i,t0,t1;
3202 if (w == 1) {
3203 out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3204 return out;
3205 }
3206
3207 t1 = 3*in_near[0] + in_far[0];
3208 out[0] = stbi__div4(t1+2);
3209 for (i=1; i < w; ++i) {
3210 t0 = t1;
3211 t1 = 3*in_near[i]+in_far[i];
3212 out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3213 out[i*2 ] = stbi__div16(3*t1 + t0 + 8);
3214 }
3215 out[w*2-1] = stbi__div4(t1+2);
3216
3217 STBI_NOTUSED(hs);
3218
3219 return out;
3220 }
3221
3222 #if defined(STBI_SSE2) || defined(STBI_NEON)
3223 static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3224 {
3225 // need to generate 2x2 samples for every one in input
3226 int i=0,t0,t1;
3227
3228 if (w == 1) {
3229 out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
3230 return out;
3231 }
3232
3233 t1 = 3*in_near[0] + in_far[0];
3234 // process groups of 8 pixels for as long as we can.
3235 // note we can't handle the last pixel in a row in this loop
3236 // because we need to handle the filter boundary conditions.
3237 for (; i < ((w-1) & ~7); i += 8) {
3238 #if defined(STBI_SSE2)
3239 // load and perform the vertical filtering pass
3240 // this uses 3*x + y = 4*x + (y - x)
3241 __m128i zero = _mm_setzero_si128();
3242 __m128i farb = _mm_loadl_epi64((__m128i *) (in_far + i));
3243 __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
3244 __m128i farw = _mm_unpacklo_epi8(farb, zero);
3245 __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
3246 __m128i diff = _mm_sub_epi16(farw, nearw);
3247 __m128i nears = _mm_slli_epi16(nearw, 2);
3248 __m128i curr = _mm_add_epi16(nears, diff); // current row
3249
3250 // horizontal filter works the same based on shifted vers of current
3251 // row. "prev" is current row shifted right by 1 pixel; we need to
3252 // insert the previous pixel value (from t1).
3253 // "next" is current row shifted left by 1 pixel, with first pixel
3254 // of next block of 8 pixels added in.
3255 __m128i prv0 = _mm_slli_si128(curr, 2);
3256 __m128i nxt0 = _mm_srli_si128(curr, 2);
3257 __m128i prev = _mm_insert_epi16(prv0, t1, 0);
3258 __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
3259
3260 // horizontal filter, polyphase implementation since it's convenient:
3261 // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3262 // odd pixels = 3*cur + next = cur*4 + (next - cur)
3263 // note the shared term.
3264 __m128i bias = _mm_set1_epi16(8);
3265 __m128i curs = _mm_slli_epi16(curr, 2);
3266 __m128i prvd = _mm_sub_epi16(prev, curr);
3267 __m128i nxtd = _mm_sub_epi16(next, curr);
3268 __m128i curb = _mm_add_epi16(curs, bias);
3269 __m128i even = _mm_add_epi16(prvd, curb);
3270 __m128i odd = _mm_add_epi16(nxtd, curb);
3271
3272 // interleave even and odd pixels, then undo scaling.
3273 __m128i int0 = _mm_unpacklo_epi16(even, odd);
3274 __m128i int1 = _mm_unpackhi_epi16(even, odd);
3275 __m128i de0 = _mm_srli_epi16(int0, 4);
3276 __m128i de1 = _mm_srli_epi16(int1, 4);
3277
3278 // pack and write output
3279 __m128i outv = _mm_packus_epi16(de0, de1);
3280 _mm_storeu_si128((__m128i *) (out + i*2), outv);
3281 #elif defined(STBI_NEON)
3282 // load and perform the vertical filtering pass
3283 // this uses 3*x + y = 4*x + (y - x)
3284 uint8x8_t farb = vld1_u8(in_far + i);
3285 uint8x8_t nearb = vld1_u8(in_near + i);
3286 int16x8_t diff = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
3287 int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
3288 int16x8_t curr = vaddq_s16(nears, diff); // current row
3289
3290 // horizontal filter works the same based on shifted vers of current
3291 // row. "prev" is current row shifted right by 1 pixel; we need to
3292 // insert the previous pixel value (from t1).
3293 // "next" is current row shifted left by 1 pixel, with first pixel
3294 // of next block of 8 pixels added in.
3295 int16x8_t prv0 = vextq_s16(curr, curr, 7);
3296 int16x8_t nxt0 = vextq_s16(curr, curr, 1);
3297 int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
3298 int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
3299
3300 // horizontal filter, polyphase implementation since it's convenient:
3301 // even pixels = 3*cur + prev = cur*4 + (prev - cur)
3302 // odd pixels = 3*cur + next = cur*4 + (next - cur)
3303 // note the shared term.
3304 int16x8_t curs = vshlq_n_s16(curr, 2);
3305 int16x8_t prvd = vsubq_s16(prev, curr);
3306 int16x8_t nxtd = vsubq_s16(next, curr);
3307 int16x8_t even = vaddq_s16(curs, prvd);
3308 int16x8_t odd = vaddq_s16(curs, nxtd);
3309
3310 // undo scaling and round, then store with even/odd phases interleaved
3311 uint8x8x2_t o;
3312 o.val[0] = vqrshrun_n_s16(even, 4);
3313 o.val[1] = vqrshrun_n_s16(odd, 4);
3314 vst2_u8(out + i*2, o);
3315 #endif
3316
3317 // "previous" value for next iter
3318 t1 = 3*in_near[i+7] + in_far[i+7];
3319 }
3320
3321 t0 = t1;
3322 t1 = 3*in_near[i] + in_far[i];
3323 out[i*2] = stbi__div16(3*t1 + t0 + 8);
3324
3325 for (++i; i < w; ++i) {
3326 t0 = t1;
3327 t1 = 3*in_near[i]+in_far[i];
3328 out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
3329 out[i*2 ] = stbi__div16(3*t1 + t0 + 8);
3330 }
3331 out[w*2-1] = stbi__div4(t1+2);
3332
3333 STBI_NOTUSED(hs);
3334
3335 return out;
3336 }
3337 #endif
3338
3339 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
3340 {
3341 // resample with nearest-neighbor
3342 int i,j;
3343 STBI_NOTUSED(in_far);
3344 for (i=0; i < w; ++i)
3345 for (j=0; j < hs; ++j)
3346 out[i*hs+j] = in_near[i];
3347 return out;
3348 }
3349
3350 #ifdef STBI_JPEG_OLD
3351 // this is the same YCbCr-to-RGB calculation that stb_image has used
3352 // historically before the algorithm changes in 1.49
3353 #define float2fixed(x) ((int) ((x) * 65536 + 0.5))
3354 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3355 {
3356 int i;
3357 for (i=0; i < count; ++i) {
3358 int y_fixed = (y[i] << 16) + 32768; // rounding
3359 int r,g,b;
3360 int cr = pcr[i] - 128;
3361 int cb = pcb[i] - 128;
3362 r = y_fixed + cr*float2fixed(1.40200f);
3363 g = y_fixed - cr*float2fixed(0.71414f) - cb*float2fixed(0.34414f);
3364 b = y_fixed + cb*float2fixed(1.77200f);
3365 r >>= 16;
3366 g >>= 16;
3367 b >>= 16;
3368 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3369 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3370 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3371 out[0] = (stbi_uc)r;
3372 out[1] = (stbi_uc)g;
3373 out[2] = (stbi_uc)b;
3374 out[3] = 255;
3375 out += step;
3376 }
3377 }
3378 #else
3379 // this is a reduced-precision calculation of YCbCr-to-RGB introduced
3380 // to make sure the code produces the same results in both SIMD and scalar
3381 #define float2fixed(x) (((int) ((x) * 4096.0f + 0.5f)) << 8)
3382 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
3383 {
3384 int i;
3385 for (i=0; i < count; ++i) {
3386 int y_fixed = (y[i] << 20) + (1<<19); // rounding
3387 int r,g,b;
3388 int cr = pcr[i] - 128;
3389 int cb = pcb[i] - 128;
3390 r = y_fixed + cr* float2fixed(1.40200f);
3391 g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3392 b = y_fixed + cb* float2fixed(1.77200f);
3393 r >>= 20;
3394 g >>= 20;
3395 b >>= 20;
3396 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3397 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3398 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3399 out[0] = (stbi_uc)r;
3400 out[1] = (stbi_uc)g;
3401 out[2] = (stbi_uc)b;
3402 out[3] = 255;
3403 out += step;
3404 }
3405 }
3406 #endif
3407
3408 #if defined(STBI_SSE2) || defined(STBI_NEON)
3409 static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
3410 {
3411 int i = 0;
3412
3413 #ifdef STBI_SSE2
3414 // step == 3 is pretty ugly on the final interleave, and i'm not convinced
3415 // it's useful in practice (you wouldn't use it for textures, for example).
3416 // so just accelerate step == 4 case.
3417 if (step == 4) {
3418 // this is a fairly straightforward implementation and not super-optimized.
3419 __m128i signflip = _mm_set1_epi8(-0x80);
3420 __m128i cr_const0 = _mm_set1_epi16( (short) ( 1.40200f*4096.0f+0.5f));
3421 __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
3422 __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
3423 __m128i cb_const1 = _mm_set1_epi16( (short) ( 1.77200f*4096.0f+0.5f));
3424 __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
3425 __m128i xw = _mm_set1_epi16(255); // alpha channel
3426
3427 for (; i+7 < count; i += 8) {
3428 // load
3429 __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
3430 __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
3431 __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
3432 __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
3433 __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
3434
3435 // unpack to short (and left-shift cr, cb by 8)
3436 __m128i yw = _mm_unpacklo_epi8(y_bias, y_bytes);
3437 __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
3438 __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
3439
3440 // color transform
3441 __m128i yws = _mm_srli_epi16(yw, 4);
3442 __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
3443 __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
3444 __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
3445 __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
3446 __m128i rws = _mm_add_epi16(cr0, yws);
3447 __m128i gwt = _mm_add_epi16(cb0, yws);
3448 __m128i bws = _mm_add_epi16(yws, cb1);
3449 __m128i gws = _mm_add_epi16(gwt, cr1);
3450
3451 // descale
3452 __m128i rw = _mm_srai_epi16(rws, 4);
3453 __m128i bw = _mm_srai_epi16(bws, 4);
3454 __m128i gw = _mm_srai_epi16(gws, 4);
3455
3456 // back to byte, set up for transpose
3457 __m128i brb = _mm_packus_epi16(rw, bw);
3458 __m128i gxb = _mm_packus_epi16(gw, xw);
3459
3460 // transpose to interleave channels
3461 __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
3462 __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
3463 __m128i o0 = _mm_unpacklo_epi16(t0, t1);
3464 __m128i o1 = _mm_unpackhi_epi16(t0, t1);
3465
3466 // store
3467 _mm_storeu_si128((__m128i *) (out + 0), o0);
3468 _mm_storeu_si128((__m128i *) (out + 16), o1);
3469 out += 32;
3470 }
3471 }
3472 #endif
3473
3474 #ifdef STBI_NEON
3475 // in this version, step=3 support would be easy to add. but is there demand?
3476 if (step == 4) {
3477 // this is a fairly straightforward implementation and not super-optimized.
3478 uint8x8_t signflip = vdup_n_u8(0x80);
3479 int16x8_t cr_const0 = vdupq_n_s16( (short) ( 1.40200f*4096.0f+0.5f));
3480 int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
3481 int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
3482 int16x8_t cb_const1 = vdupq_n_s16( (short) ( 1.77200f*4096.0f+0.5f));
3483
3484 for (; i+7 < count; i += 8) {
3485 // load
3486 uint8x8_t y_bytes = vld1_u8(y + i);
3487 uint8x8_t cr_bytes = vld1_u8(pcr + i);
3488 uint8x8_t cb_bytes = vld1_u8(pcb + i);
3489 int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
3490 int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
3491
3492 // expand to s16
3493 int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
3494 int16x8_t crw = vshll_n_s8(cr_biased, 7);
3495 int16x8_t cbw = vshll_n_s8(cb_biased, 7);
3496
3497 // color transform
3498 int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
3499 int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
3500 int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
3501 int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
3502 int16x8_t rws = vaddq_s16(yws, cr0);
3503 int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
3504 int16x8_t bws = vaddq_s16(yws, cb1);
3505
3506 // undo scaling, round, convert to byte
3507 uint8x8x4_t o;
3508 o.val[0] = vqrshrun_n_s16(rws, 4);
3509 o.val[1] = vqrshrun_n_s16(gws, 4);
3510 o.val[2] = vqrshrun_n_s16(bws, 4);
3511 o.val[3] = vdup_n_u8(255);
3512
3513 // store, interleaving r/g/b/a
3514 vst4_u8(out, o);
3515 out += 8*4;
3516 }
3517 }
3518 #endif
3519
3520 for (; i < count; ++i) {
3521 int y_fixed = (y[i] << 20) + (1<<19); // rounding
3522 int r,g,b;
3523 int cr = pcr[i] - 128;
3524 int cb = pcb[i] - 128;
3525 r = y_fixed + cr* float2fixed(1.40200f);
3526 g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000);
3527 b = y_fixed + cb* float2fixed(1.77200f);
3528 r >>= 20;
3529 g >>= 20;
3530 b >>= 20;
3531 if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
3532 if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
3533 if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
3534 out[0] = (stbi_uc)r;
3535 out[1] = (stbi_uc)g;
3536 out[2] = (stbi_uc)b;
3537 out[3] = 255;
3538 out += step;
3539 }
3540 }
3541 #endif
3542
3543 // set up the kernels
3544 static void stbi__setup_jpeg(stbi__jpeg *j)
3545 {
3546 j->idct_block_kernel = stbi__idct_block;
3547 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
3548 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
3549
3550 #ifdef STBI_SSE2
3551 if (stbi__sse2_available()) {
3552 j->idct_block_kernel = stbi__idct_simd;
3553 #ifndef STBI_JPEG_OLD
3554 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3555 #endif
3556 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3557 }
3558 #endif
3559
3560 #ifdef STBI_NEON
3561 j->idct_block_kernel = stbi__idct_simd;
3562 #ifndef STBI_JPEG_OLD
3563 j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
3564 #endif
3565 j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
3566 #endif
3567 }
3568
3569 // clean up the temporary component buffers
3570 static void stbi__cleanup_jpeg(stbi__jpeg *j)
3571 {
3572 stbi__free_jpeg_components(j, j->s->img_n, 0);
3573 }
3574
3575 typedef struct
3576 {
3577 resample_row_func resample;
3578 stbi_uc *line0,*line1;
3579 int hs,vs; // expansion factor in each axis
3580 int w_lores; // horizontal pixels pre-expansion
3581 int ystep; // how far through vertical expansion we are
3582 int ypos; // which pre-expansion row we're on
3583 } stbi__resample;
3584
3585 static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
3586 {
3587 int n, decode_n;
3588 z->s->img_n = 0; // make stbi__cleanup_jpeg safe
3589
3590 // validate req_comp
3591 if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
3592
3593 // load a jpeg image from whichever source, but leave in YCbCr format
3594 if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
3595
3596 // determine actual number of components to generate
3597 n = req_comp ? req_comp : z->s->img_n;
3598
3599 if (z->s->img_n == 3 && n < 3)
3600 decode_n = 1;
3601 else
3602 decode_n = z->s->img_n;
3603
3604 // resample and color-convert
3605 {
3606 int k;
3607 unsigned int i,j;
3608 stbi_uc *output;
3609 stbi_uc *coutput[4];
3610
3611 stbi__resample res_comp[4];
3612
3613 for (k=0; k < decode_n; ++k) {
3614 stbi__resample *r = &res_comp[k];
3615
3616 // allocate line buffer big enough for upsampling off the edges
3617 // with upsample factor of 4
3618 z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
3619 if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3620
3621 r->hs = z->img_h_max / z->img_comp[k].h;
3622 r->vs = z->img_v_max / z->img_comp[k].v;
3623 r->ystep = r->vs >> 1;
3624 r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
3625 r->ypos = 0;
3626 r->line0 = r->line1 = z->img_comp[k].data;
3627
3628 if (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
3629 else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
3630 else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
3631 else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
3632 else r->resample = stbi__resample_row_generic;
3633 }
3634
3635 // can't error after this so, this is safe
3636 output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
3637 if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
3638
3639 // now go ahead and resample
3640 for (j=0; j < z->s->img_y; ++j) {
3641 stbi_uc *out = output + n * z->s->img_x * j;
3642 for (k=0; k < decode_n; ++k) {
3643 stbi__resample *r = &res_comp[k];
3644 int y_bot = r->ystep >= (r->vs >> 1);
3645 coutput[k] = r->resample(z->img_comp[k].linebuf,
3646 y_bot ? r->line1 : r->line0,
3647 y_bot ? r->line0 : r->line1,
3648 r->w_lores, r->hs);
3649 if (++r->ystep >= r->vs) {
3650 r->ystep = 0;
3651 r->line0 = r->line1;
3652 if (++r->ypos < z->img_comp[k].y)
3653 r->line1 += z->img_comp[k].w2;
3654 }
3655 }
3656 if (n >= 3) {
3657 stbi_uc *y = coutput[0];
3658 if (z->s->img_n == 3) {
3659 if (z->rgb == 3) {
3660 for (i=0; i < z->s->img_x; ++i) {
3661 out[0] = y[i];
3662 out[1] = coutput[1][i];
3663 out[2] = coutput[2][i];
3664 out[3] = 255;
3665 out += n;
3666 }
3667 } else {
3668 z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
3669 }
3670 } else
3671 for (i=0; i < z->s->img_x; ++i) {
3672 out[0] = out[1] = out[2] = y[i];
3673 out[3] = 255; // not used if n==3
3674 out += n;
3675 }
3676 } else {
3677 stbi_uc *y = coutput[0];
3678 if (n == 1)
3679 for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
3680 else
3681 for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255;
3682 }
3683 }
3684 stbi__cleanup_jpeg(z);
3685 *out_x = z->s->img_x;
3686 *out_y = z->s->img_y;
3687 if (comp) *comp = z->s->img_n; // report original components, not output
3688 return output;
3689 }
3690 }
3691
3692 static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
3693 {
3694 unsigned char* result;
3695 stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
3696 j->s = s;
3697 stbi__setup_jpeg(j);
3698 result = load_jpeg_image(j, x,y,comp,req_comp);
3699 STBI_FREE(j);
3700 return result;
3701 }
3702
3703 static int stbi__jpeg_test(stbi__context *s)
3704 {
3705 int r;
3706 stbi__jpeg j;
3707 j.s = s;
3708 stbi__setup_jpeg(&j);
3709 r = stbi__decode_jpeg_header(&j, STBI__SCAN_type);
3710 stbi__rewind(s);
3711 return r;
3712 }
3713
3714 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
3715 {
3716 if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
3717 stbi__rewind( j->s );
3718 return 0;
3719 }
3720 if (x) *x = j->s->img_x;
3721 if (y) *y = j->s->img_y;
3722 if (comp) *comp = j->s->img_n;
3723 return 1;
3724 }
3725
3726 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
3727 {
3728 int result;
3729 stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
3730 j->s = s;
3731 result = stbi__jpeg_info_raw(j, x, y, comp);
3732 STBI_FREE(j);
3733 return result;
3734 }
3735 #endif
3736
3737 // public domain zlib decode v0.2 Sean Barrett 2006-11-18
3738 // simple implementation
3739 // - all input must be provided in an upfront buffer
3740 // - all output is written to a single output buffer (can malloc/realloc)
3741 // performance
3742 // - fast huffman
3743
3744 #ifndef STBI_NO_ZLIB
3745
3746 // fast-way is faster to check than jpeg huffman, but slow way is slower
3747 #define STBI__ZFAST_BITS 9 // accelerate all cases in default tables
3748 #define STBI__ZFAST_MASK ((1 << STBI__ZFAST_BITS) - 1)
3749
3750 // zlib-style huffman encoding
3751 // (jpegs packs from left, zlib from right, so can't share code)
3752 typedef struct
3753 {
3754 stbi__uint16 fast[1 << STBI__ZFAST_BITS];
3755 stbi__uint16 firstcode[16];
3756 int maxcode[17];
3757 stbi__uint16 firstsymbol[16];
3758 stbi_uc size[288];
3759 stbi__uint16 value[288];
3760 } stbi__zhuffman;
3761
3762 stbi_inline static int stbi__bitreverse16(int n)
3763 {
3764 n = ((n & 0xAAAA) >> 1) | ((n & 0x5555) << 1);
3765 n = ((n & 0xCCCC) >> 2) | ((n & 0x3333) << 2);
3766 n = ((n & 0xF0F0) >> 4) | ((n & 0x0F0F) << 4);
3767 n = ((n & 0xFF00) >> 8) | ((n & 0x00FF) << 8);
3768 return n;
3769 }
3770
3771 stbi_inline static int stbi__bit_reverse(int v, int bits)
3772 {
3773 STBI_ASSERT(bits <= 16);
3774 // to bit reverse n bits, reverse 16 and shift
3775 // e.g. 11 bits, bit reverse and shift away 5
3776 return stbi__bitreverse16(v) >> (16-bits);
3777 }
3778
3779 static int stbi__zbuild_huffman(stbi__zhuffman *z, stbi_uc *sizelist, int num)
3780 {
3781 int i,k=0;
3782 int code, next_code[16], sizes[17];
3783
3784 // DEFLATE spec for generating codes
3785 memset(sizes, 0, sizeof(sizes));
3786 memset(z->fast, 0, sizeof(z->fast));
3787 for (i=0; i < num; ++i)
3788 ++sizes[sizelist[i]];
3789 sizes[0] = 0;
3790 for (i=1; i < 16; ++i)
3791 if (sizes[i] > (1 << i))
3792 return stbi__err("bad sizes", "Corrupt PNG");
3793 code = 0;
3794 for (i=1; i < 16; ++i) {
3795 next_code[i] = code;
3796 z->firstcode[i] = (stbi__uint16) code;
3797 z->firstsymbol[i] = (stbi__uint16) k;
3798 code = (code + sizes[i]);
3799 if (sizes[i])
3800 if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
3801 z->maxcode[i] = code << (16-i); // preshift for inner loop
3802 code <<= 1;
3803 k += sizes[i];
3804 }
3805 z->maxcode[16] = 0x10000; // sentinel
3806 for (i=0; i < num; ++i) {
3807 int s = sizelist[i];
3808 if (s) {
3809 int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
3810 stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
3811 z->size [c] = (stbi_uc ) s;
3812 z->value[c] = (stbi__uint16) i;
3813 if (s <= STBI__ZFAST_BITS) {
3814 int j = stbi__bit_reverse(next_code[s],s);
3815 while (j < (1 << STBI__ZFAST_BITS)) {
3816 z->fast[j] = fastv;
3817 j += (1 << s);
3818 }
3819 }
3820 ++next_code[s];
3821 }
3822 }
3823 return 1;
3824 }
3825
3826 // zlib-from-memory implementation for PNG reading
3827 // because PNG allows splitting the zlib stream arbitrarily,
3828 // and it's annoying structurally to have PNG call ZLIB call PNG,
3829 // we require PNG read all the IDATs and combine them into a single
3830 // memory buffer
3831
3832 typedef struct
3833 {
3834 stbi_uc *zbuffer, *zbuffer_end;
3835 int num_bits;
3836 stbi__uint32 code_buffer;
3837
3838 char *zout;
3839 char *zout_start;
3840 char *zout_end;
3841 int z_expandable;
3842
3843 stbi__zhuffman z_length, z_distance;
3844 } stbi__zbuf;
3845
3846 stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
3847 {
3848 if (z->zbuffer >= z->zbuffer_end) return 0;
3849 return *z->zbuffer++;
3850 }
3851
3852 static void stbi__fill_bits(stbi__zbuf *z)
3853 {
3854 do {
3855 STBI_ASSERT(z->code_buffer < (1U << z->num_bits));
3856 z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
3857 z->num_bits += 8;
3858 } while (z->num_bits <= 24);
3859 }
3860
3861 stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
3862 {
3863 unsigned int k;
3864 if (z->num_bits < n) stbi__fill_bits(z);
3865 k = z->code_buffer & ((1 << n) - 1);
3866 z->code_buffer >>= n;
3867 z->num_bits -= n;
3868 return k;
3869 }
3870
3871 static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
3872 {
3873 int b,s,k;
3874 // not resolved by fast table, so compute it the slow way
3875 // use jpeg approach, which requires MSbits at top
3876 k = stbi__bit_reverse(a->code_buffer, 16);
3877 for (s=STBI__ZFAST_BITS+1; ; ++s)
3878 if (k < z->maxcode[s])
3879 break;
3880 if (s == 16) return -1; // invalid code!
3881 // code size is s, so:
3882 b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
3883 STBI_ASSERT(z->size[b] == s);
3884 a->code_buffer >>= s;
3885 a->num_bits -= s;
3886 return z->value[b];
3887 }
3888
3889 stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
3890 {
3891 int b,s;
3892 if (a->num_bits < 16) stbi__fill_bits(a);
3893 b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
3894 if (b) {
3895 s = b >> 9;
3896 a->code_buffer >>= s;
3897 a->num_bits -= s;
3898 return b & 511;
3899 }
3900 return stbi__zhuffman_decode_slowpath(a, z);
3901 }
3902
3903 static int stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room for n bytes
3904 {
3905 char *q;
3906 int cur, limit, old_limit;
3907 z->zout = zout;
3908 if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
3909 cur = (int) (z->zout - z->zout_start);
3910 limit = old_limit = (int) (z->zout_end - z->zout_start);
3911 while (cur + n > limit)
3912 limit *= 2;
3913 q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
3914 STBI_NOTUSED(old_limit);
3915 if (q == NULL) return stbi__err("outofmem", "Out of memory");
3916 z->zout_start = q;
3917 z->zout = q + cur;
3918 z->zout_end = q + limit;
3919 return 1;
3920 }
3921
3922 static int stbi__zlength_base[31] = {
3923 3,4,5,6,7,8,9,10,11,13,
3924 15,17,19,23,27,31,35,43,51,59,
3925 67,83,99,115,131,163,195,227,258,0,0 };
3926
3927 static int stbi__zlength_extra[31]=
3928 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
3929
3930 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
3931 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
3932
3933 static int stbi__zdist_extra[32] =
3934 { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
3935
3936 static int stbi__parse_huffman_block(stbi__zbuf *a)
3937 {
3938 char *zout = a->zout;
3939 for(;;) {
3940 int z = stbi__zhuffman_decode(a, &a->z_length);
3941 if (z < 256) {
3942 if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
3943 if (zout >= a->zout_end) {
3944 if (!stbi__zexpand(a, zout, 1)) return 0;
3945 zout = a->zout;
3946 }
3947 *zout++ = (char) z;
3948 } else {
3949 stbi_uc *p;
3950 int len,dist;
3951 if (z == 256) {
3952 a->zout = zout;
3953 return 1;
3954 }
3955 z -= 257;
3956 len = stbi__zlength_base[z];
3957 if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
3958 z = stbi__zhuffman_decode(a, &a->z_distance);
3959 if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
3960 dist = stbi__zdist_base[z];
3961 if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
3962 if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
3963 if (zout + len > a->zout_end) {
3964 if (!stbi__zexpand(a, zout, len)) return 0;
3965 zout = a->zout;
3966 }
3967 p = (stbi_uc *) (zout - dist);
3968 if (dist == 1) { // run of one byte; common in images.
3969 stbi_uc v = *p;
3970 if (len) { do *zout++ = v; while (--len); }
3971 } else {
3972 if (len) { do *zout++ = *p++; while (--len); }
3973 }
3974 }
3975 }
3976 }
3977
3978 static int stbi__compute_huffman_codes(stbi__zbuf *a)
3979 {
3980 static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
3981 stbi__zhuffman z_codelength;
3982 stbi_uc lencodes[286+32+137];//padding for maximum single op
3983 stbi_uc codelength_sizes[19];
3984 int i,n;
3985
3986 int hlit = stbi__zreceive(a,5) + 257;
3987 int hdist = stbi__zreceive(a,5) + 1;
3988 int hclen = stbi__zreceive(a,4) + 4;
3989 int ntot = hlit + hdist;
3990
3991 memset(codelength_sizes, 0, sizeof(codelength_sizes));
3992 for (i=0; i < hclen; ++i) {
3993 int s = stbi__zreceive(a,3);
3994 codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
3995 }
3996 if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
3997
3998 n = 0;
3999 while (n < ntot) {
4000 int c = stbi__zhuffman_decode(a, &z_codelength);
4001 if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
4002 if (c < 16)
4003 lencodes[n++] = (stbi_uc) c;
4004 else {
4005 stbi_uc fill = 0;
4006 if (c == 16) {
4007 c = stbi__zreceive(a,2)+3;
4008 if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
4009 fill = lencodes[n-1];
4010 } else if (c == 17)
4011 c = stbi__zreceive(a,3)+3;
4012 else {
4013 STBI_ASSERT(c == 18);
4014 c = stbi__zreceive(a,7)+11;
4015 }
4016 if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
4017 memset(lencodes+n, fill, c);
4018 n += c;
4019 }
4020 }
4021 if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
4022 if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
4023 if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
4024 return 1;
4025 }
4026
4027 static int stbi__parse_uncompressed_block(stbi__zbuf *a)
4028 {
4029 stbi_uc header[4];
4030 int len,nlen,k;
4031 if (a->num_bits & 7)
4032 stbi__zreceive(a, a->num_bits & 7); // discard
4033 // drain the bit-packed data into header
4034 k = 0;
4035 while (a->num_bits > 0) {
4036 header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
4037 a->code_buffer >>= 8;
4038 a->num_bits -= 8;
4039 }
4040 STBI_ASSERT(a->num_bits == 0);
4041 // now fill header the normal way
4042 while (k < 4)
4043 header[k++] = stbi__zget8(a);
4044 len = header[1] * 256 + header[0];
4045 nlen = header[3] * 256 + header[2];
4046 if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
4047 if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
4048 if (a->zout + len > a->zout_end)
4049 if (!stbi__zexpand(a, a->zout, len)) return 0;
4050 memcpy(a->zout, a->zbuffer, len);
4051 a->zbuffer += len;
4052 a->zout += len;
4053 return 1;
4054 }
4055
4056 static int stbi__parse_zlib_header(stbi__zbuf *a)
4057 {
4058 int cmf = stbi__zget8(a);
4059 int cm = cmf & 15;
4060 /* int cinfo = cmf >> 4; */
4061 int flg = stbi__zget8(a);
4062 if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
4063 if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
4064 if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
4065 // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
4066 return 1;
4067 }
4068
4069 // @TODO: should statically initialize these for optimal thread safety
4070 static stbi_uc stbi__zdefault_length[288], stbi__zdefault_distance[32];
4071 static void stbi__init_zdefaults(void)
4072 {
4073 int i; // use <= to match clearly with spec
4074 for (i=0; i <= 143; ++i) stbi__zdefault_length[i] = 8;
4075 for ( ; i <= 255; ++i) stbi__zdefault_length[i] = 9;
4076 for ( ; i <= 279; ++i) stbi__zdefault_length[i] = 7;
4077 for ( ; i <= 287; ++i) stbi__zdefault_length[i] = 8;
4078
4079 for (i=0; i <= 31; ++i) stbi__zdefault_distance[i] = 5;
4080 }
4081
4082 static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
4083 {
4084 int final, type;
4085 if (parse_header)
4086 if (!stbi__parse_zlib_header(a)) return 0;
4087 a->num_bits = 0;
4088 a->code_buffer = 0;
4089 do {
4090 final = stbi__zreceive(a,1);
4091 type = stbi__zreceive(a,2);
4092 if (type == 0) {
4093 if (!stbi__parse_uncompressed_block(a)) return 0;
4094 } else if (type == 3) {
4095 return 0;
4096 } else {
4097 if (type == 1) {
4098 // use fixed code lengths
4099 if (!stbi__zdefault_distance[31]) stbi__init_zdefaults();
4100 if (!stbi__zbuild_huffman(&a->z_length , stbi__zdefault_length , 288)) return 0;
4101 if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance, 32)) return 0;
4102 } else {
4103 if (!stbi__compute_huffman_codes(a)) return 0;
4104 }
4105 if (!stbi__parse_huffman_block(a)) return 0;
4106 }
4107 } while (!final);
4108 return 1;
4109 }
4110
4111 static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
4112 {
4113 a->zout_start = obuf;
4114 a->zout = obuf;
4115 a->zout_end = obuf + olen;
4116 a->z_expandable = exp;
4117
4118 return stbi__parse_zlib(a, parse_header);
4119 }
4120
4121 STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
4122 {
4123 stbi__zbuf a;
4124 char *p = (char *) stbi__malloc(initial_size);
4125 if (p == NULL) return NULL;
4126 a.zbuffer = (stbi_uc *) buffer;
4127 a.zbuffer_end = (stbi_uc *) buffer + len;
4128 if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
4129 if (outlen) *outlen = (int) (a.zout - a.zout_start);
4130 return a.zout_start;
4131 } else {
4132 STBI_FREE(a.zout_start);
4133 return NULL;
4134 }
4135 }
4136
4137 STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
4138 {
4139 return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
4140 }
4141
4142 STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
4143 {
4144 stbi__zbuf a;
4145 char *p = (char *) stbi__malloc(initial_size);
4146 if (p == NULL) return NULL;
4147 a.zbuffer = (stbi_uc *) buffer;
4148 a.zbuffer_end = (stbi_uc *) buffer + len;
4149 if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
4150 if (outlen) *outlen = (int) (a.zout - a.zout_start);
4151 return a.zout_start;
4152 } else {
4153 STBI_FREE(a.zout_start);
4154 return NULL;
4155 }
4156 }
4157
4158 STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
4159 {
4160 stbi__zbuf a;
4161 a.zbuffer = (stbi_uc *) ibuffer;
4162 a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4163 if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
4164 return (int) (a.zout - a.zout_start);
4165 else
4166 return -1;
4167 }
4168
4169 STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
4170 {
4171 stbi__zbuf a;
4172 char *p = (char *) stbi__malloc(16384);
4173 if (p == NULL) return NULL;
4174 a.zbuffer = (stbi_uc *) buffer;
4175 a.zbuffer_end = (stbi_uc *) buffer+len;
4176 if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
4177 if (outlen) *outlen = (int) (a.zout - a.zout_start);
4178 return a.zout_start;
4179 } else {
4180 STBI_FREE(a.zout_start);
4181 return NULL;
4182 }
4183 }
4184
4185 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
4186 {
4187 stbi__zbuf a;
4188 a.zbuffer = (stbi_uc *) ibuffer;
4189 a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
4190 if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
4191 return (int) (a.zout - a.zout_start);
4192 else
4193 return -1;
4194 }
4195 #endif
4196
4197 // public domain "baseline" PNG decoder v0.10 Sean Barrett 2006-11-18
4198 // simple implementation
4199 // - only 8-bit samples
4200 // - no CRC checking
4201 // - allocates lots of intermediate memory
4202 // - avoids problem of streaming data between subsystems
4203 // - avoids explicit window management
4204 // performance
4205 // - uses stb_zlib, a PD zlib implementation with fast huffman decoding
4206
4207 #ifndef STBI_NO_PNG
4208 typedef struct
4209 {
4210 stbi__uint32 length;
4211 stbi__uint32 type;
4212 } stbi__pngchunk;
4213
4214 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
4215 {
4216 stbi__pngchunk c;
4217 c.length = stbi__get32be(s);
4218 c.type = stbi__get32be(s);
4219 return c;
4220 }
4221
4222 static int stbi__check_png_header(stbi__context *s)
4223 {
4224 static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
4225 int i;
4226 for (i=0; i < 8; ++i)
4227 if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
4228 return 1;
4229 }
4230
4231 typedef struct
4232 {
4233 stbi__context *s;
4234 stbi_uc *idata, *expanded, *out;
4235 int depth;
4236 } stbi__png;
4237
4238
4239 enum {
4240 STBI__F_none=0,
4241 STBI__F_sub=1,
4242 STBI__F_up=2,
4243 STBI__F_avg=3,
4244 STBI__F_paeth=4,
4245 // synthetic filters used for first scanline to avoid needing a dummy row of 0s
4246 STBI__F_avg_first,
4247 STBI__F_paeth_first
4248 };
4249
4250 static stbi_uc first_row_filter[5] =
4251 {
4252 STBI__F_none,
4253 STBI__F_sub,
4254 STBI__F_none,
4255 STBI__F_avg_first,
4256 STBI__F_paeth_first
4257 };
4258
4259 static int stbi__paeth(int a, int b, int c)
4260 {
4261 int p = a + b - c;
4262 int pa = abs(p-a);
4263 int pb = abs(p-b);
4264 int pc = abs(p-c);
4265 if (pa <= pb && pa <= pc) return a;
4266 if (pb <= pc) return b;
4267 return c;
4268 }
4269
4270 static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
4271
4272 // create the png data from post-deflated data
4273 static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
4274 {
4275 int bytes = (depth == 16? 2 : 1);
4276 stbi__context *s = a->s;
4277 stbi__uint32 i,j,stride = x*out_n*bytes;
4278 stbi__uint32 img_len, img_width_bytes;
4279 int k;
4280 int img_n = s->img_n; // copy it into a local for later
4281
4282 int output_bytes = out_n*bytes;
4283 int filter_bytes = img_n*bytes;
4284 int width = x;
4285
4286 STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
4287 a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
4288 if (!a->out) return stbi__err("outofmem", "Out of memory");
4289
4290 img_width_bytes = (((img_n * x * depth) + 7) >> 3);
4291 img_len = (img_width_bytes + 1) * y;
4292 if (s->img_x == x && s->img_y == y) {
4293 if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG");
4294 } else { // interlaced:
4295 if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
4296 }
4297
4298 for (j=0; j < y; ++j) {
4299 stbi_uc *cur = a->out + stride*j;
4300 stbi_uc *prior = cur - stride;
4301 int filter = *raw++;
4302
4303 if (filter > 4)
4304 return stbi__err("invalid filter","Corrupt PNG");
4305
4306 if (depth < 8) {
4307 STBI_ASSERT(img_width_bytes <= x);
4308 cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
4309 filter_bytes = 1;
4310 width = img_width_bytes;
4311 }
4312
4313 // if first row, use special filter that doesn't sample previous row
4314 if (j == 0) filter = first_row_filter[filter];
4315
4316 // handle first byte explicitly
4317 for (k=0; k < filter_bytes; ++k) {
4318 switch (filter) {
4319 case STBI__F_none : cur[k] = raw[k]; break;
4320 case STBI__F_sub : cur[k] = raw[k]; break;
4321 case STBI__F_up : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
4322 case STBI__F_avg : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
4323 case STBI__F_paeth : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
4324 case STBI__F_avg_first : cur[k] = raw[k]; break;
4325 case STBI__F_paeth_first: cur[k] = raw[k]; break;
4326 }
4327 }
4328
4329 if (depth == 8) {
4330 if (img_n != out_n)
4331 cur[img_n] = 255; // first pixel
4332 raw += img_n;
4333 cur += out_n;
4334 prior += out_n;
4335 } else if (depth == 16) {
4336 if (img_n != out_n) {
4337 cur[filter_bytes] = 255; // first pixel top byte
4338 cur[filter_bytes+1] = 255; // first pixel bottom byte
4339 }
4340 raw += filter_bytes;
4341 cur += output_bytes;
4342 prior += output_bytes;
4343 } else {
4344 raw += 1;
4345 cur += 1;
4346 prior += 1;
4347 }
4348
4349 // this is a little gross, so that we don't switch per-pixel or per-component
4350 if (depth < 8 || img_n == out_n) {
4351 int nk = (width - 1)*filter_bytes;
4352 #define STBI__CASE(f) \
4353 case f: \
4354 for (k=0; k < nk; ++k)
4355 switch (filter) {
4356 // "none" filter turns into a memcpy here; make that explicit.
4357 case STBI__F_none: memcpy(cur, raw, nk); break;
4358 STBI__CASE(STBI__F_sub) { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
4359 STBI__CASE(STBI__F_up) { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4360 STBI__CASE(STBI__F_avg) { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
4361 STBI__CASE(STBI__F_paeth) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
4362 STBI__CASE(STBI__F_avg_first) { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
4363 STBI__CASE(STBI__F_paeth_first) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
4364 }
4365 #undef STBI__CASE
4366 raw += nk;
4367 } else {
4368 STBI_ASSERT(img_n+1 == out_n);
4369 #define STBI__CASE(f) \
4370 case f: \
4371 for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
4372 for (k=0; k < filter_bytes; ++k)
4373 switch (filter) {
4374 STBI__CASE(STBI__F_none) { cur[k] = raw[k]; } break;
4375 STBI__CASE(STBI__F_sub) { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
4376 STBI__CASE(STBI__F_up) { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
4377 STBI__CASE(STBI__F_avg) { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
4378 STBI__CASE(STBI__F_paeth) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
4379 STBI__CASE(STBI__F_avg_first) { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
4380 STBI__CASE(STBI__F_paeth_first) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
4381 }
4382 #undef STBI__CASE
4383
4384 // the loop above sets the high byte of the pixels' alpha, but for
4385 // 16 bit png files we also need the low byte set. we'll do that here.
4386 if (depth == 16) {
4387 cur = a->out + stride*j; // start at the beginning of the row again
4388 for (i=0; i < x; ++i,cur+=output_bytes) {
4389 cur[filter_bytes+1] = 255;
4390 }
4391 }
4392 }
4393 }
4394
4395 // we make a separate pass to expand bits to pixels; for performance,
4396 // this could run two scanlines behind the above code, so it won't
4397 // intefere with filtering but will still be in the cache.
4398 if (depth < 8) {
4399 for (j=0; j < y; ++j) {
4400 stbi_uc *cur = a->out + stride*j;
4401 stbi_uc *in = a->out + stride*j + x*out_n - img_width_bytes;
4402 // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
4403 // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
4404 stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
4405
4406 // note that the final byte might overshoot and write more data than desired.
4407 // we can allocate enough data that this never writes out of memory, but it
4408 // could also overwrite the next scanline. can it overwrite non-empty data
4409 // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
4410 // so we need to explicitly clamp the final ones
4411
4412 if (depth == 4) {
4413 for (k=x*img_n; k >= 2; k-=2, ++in) {
4414 *cur++ = scale * ((*in >> 4) );
4415 *cur++ = scale * ((*in ) & 0x0f);
4416 }
4417 if (k > 0) *cur++ = scale * ((*in >> 4) );
4418 } else if (depth == 2) {
4419 for (k=x*img_n; k >= 4; k-=4, ++in) {
4420 *cur++ = scale * ((*in >> 6) );
4421 *cur++ = scale * ((*in >> 4) & 0x03);
4422 *cur++ = scale * ((*in >> 2) & 0x03);
4423 *cur++ = scale * ((*in ) & 0x03);
4424 }
4425 if (k > 0) *cur++ = scale * ((*in >> 6) );
4426 if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
4427 if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
4428 } else if (depth == 1) {
4429 for (k=x*img_n; k >= 8; k-=8, ++in) {
4430 *cur++ = scale * ((*in >> 7) );
4431 *cur++ = scale * ((*in >> 6) & 0x01);
4432 *cur++ = scale * ((*in >> 5) & 0x01);
4433 *cur++ = scale * ((*in >> 4) & 0x01);
4434 *cur++ = scale * ((*in >> 3) & 0x01);
4435 *cur++ = scale * ((*in >> 2) & 0x01);
4436 *cur++ = scale * ((*in >> 1) & 0x01);
4437 *cur++ = scale * ((*in ) & 0x01);
4438 }
4439 if (k > 0) *cur++ = scale * ((*in >> 7) );
4440 if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
4441 if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
4442 if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
4443 if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
4444 if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
4445 if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
4446 }
4447 if (img_n != out_n) {
4448 int q;
4449 // insert alpha = 255
4450 cur = a->out + stride*j;
4451 if (img_n == 1) {
4452 for (q=x-1; q >= 0; --q) {
4453 cur[q*2+1] = 255;
4454 cur[q*2+0] = cur[q];
4455 }
4456 } else {
4457 STBI_ASSERT(img_n == 3);
4458 for (q=x-1; q >= 0; --q) {
4459 cur[q*4+3] = 255;
4460 cur[q*4+2] = cur[q*3+2];
4461 cur[q*4+1] = cur[q*3+1];
4462 cur[q*4+0] = cur[q*3+0];
4463 }
4464 }
4465 }
4466 }
4467 } else if (depth == 16) {
4468 // force the image data from big-endian to platform-native.
4469 // this is done in a separate pass due to the decoding relying
4470 // on the data being untouched, but could probably be done
4471 // per-line during decode if care is taken.
4472 stbi_uc *cur = a->out;
4473 stbi__uint16 *cur16 = (stbi__uint16*)cur;
4474
4475 for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
4476 *cur16 = (cur[0] << 8) | cur[1];
4477 }
4478 }
4479
4480 return 1;
4481 }
4482
4483 static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
4484 {
4485 int bytes = (depth == 16 ? 2 : 1);
4486 int out_bytes = out_n * bytes;
4487 stbi_uc *final;
4488 int p;
4489 if (!interlaced)
4490 return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
4491
4492 // de-interlacing
4493 final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
4494 for (p=0; p < 7; ++p) {
4495 int xorig[] = { 0,4,0,2,0,1,0 };
4496 int yorig[] = { 0,0,4,0,2,0,1 };
4497 int xspc[] = { 8,8,4,4,2,2,1 };
4498 int yspc[] = { 8,8,8,4,4,2,2 };
4499 int i,j,x,y;
4500 // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
4501 x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
4502 y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
4503 if (x && y) {
4504 stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
4505 if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
4506 STBI_FREE(final);
4507 return 0;
4508 }
4509 for (j=0; j < y; ++j) {
4510 for (i=0; i < x; ++i) {
4511 int out_y = j*yspc[p]+yorig[p];
4512 int out_x = i*xspc[p]+xorig[p];
4513 memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
4514 a->out + (j*x+i)*out_bytes, out_bytes);
4515 }
4516 }
4517 STBI_FREE(a->out);
4518 image_data += img_len;
4519 image_data_len -= img_len;
4520 }
4521 }
4522 a->out = final;
4523
4524 return 1;
4525 }
4526
4527 static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
4528 {
4529 stbi__context *s = z->s;
4530 stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4531 stbi_uc *p = z->out;
4532
4533 // compute color-based transparency, assuming we've
4534 // already got 255 as the alpha value in the output
4535 STBI_ASSERT(out_n == 2 || out_n == 4);
4536
4537 if (out_n == 2) {
4538 for (i=0; i < pixel_count; ++i) {
4539 p[1] = (p[0] == tc[0] ? 0 : 255);
4540 p += 2;
4541 }
4542 } else {
4543 for (i=0; i < pixel_count; ++i) {
4544 if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4545 p[3] = 0;
4546 p += 4;
4547 }
4548 }
4549 return 1;
4550 }
4551
4552 static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
4553 {
4554 stbi__context *s = z->s;
4555 stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4556 stbi__uint16 *p = (stbi__uint16*) z->out;
4557
4558 // compute color-based transparency, assuming we've
4559 // already got 65535 as the alpha value in the output
4560 STBI_ASSERT(out_n == 2 || out_n == 4);
4561
4562 if (out_n == 2) {
4563 for (i = 0; i < pixel_count; ++i) {
4564 p[1] = (p[0] == tc[0] ? 0 : 65535);
4565 p += 2;
4566 }
4567 } else {
4568 for (i = 0; i < pixel_count; ++i) {
4569 if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
4570 p[3] = 0;
4571 p += 4;
4572 }
4573 }
4574 return 1;
4575 }
4576
4577 static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
4578 {
4579 stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
4580 stbi_uc *p, *temp_out, *orig = a->out;
4581
4582 p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
4583 if (p == NULL) return stbi__err("outofmem", "Out of memory");
4584
4585 // between here and free(out) below, exitting would leak
4586 temp_out = p;
4587
4588 if (pal_img_n == 3) {
4589 for (i=0; i < pixel_count; ++i) {
4590 int n = orig[i]*4;
4591 p[0] = palette[n ];
4592 p[1] = palette[n+1];
4593 p[2] = palette[n+2];
4594 p += 3;
4595 }
4596 } else {
4597 for (i=0; i < pixel_count; ++i) {
4598 int n = orig[i]*4;
4599 p[0] = palette[n ];
4600 p[1] = palette[n+1];
4601 p[2] = palette[n+2];
4602 p[3] = palette[n+3];
4603 p += 4;
4604 }
4605 }
4606 STBI_FREE(a->out);
4607 a->out = temp_out;
4608
4609 STBI_NOTUSED(len);
4610
4611 return 1;
4612 }
4613
4614 static int stbi__unpremultiply_on_load = 0;
4615 static int stbi__de_iphone_flag = 0;
4616
4617 STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
4618 {
4619 stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply;
4620 }
4621
4622 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
4623 {
4624 stbi__de_iphone_flag = flag_true_if_should_convert;
4625 }
4626
4627 static void stbi__de_iphone(stbi__png *z)
4628 {
4629 stbi__context *s = z->s;
4630 stbi__uint32 i, pixel_count = s->img_x * s->img_y;
4631 stbi_uc *p = z->out;
4632
4633 if (s->img_out_n == 3) { // convert bgr to rgb
4634 for (i=0; i < pixel_count; ++i) {
4635 stbi_uc t = p[0];
4636 p[0] = p[2];
4637 p[2] = t;
4638 p += 3;
4639 }
4640 } else {
4641 STBI_ASSERT(s->img_out_n == 4);
4642 if (stbi__unpremultiply_on_load) {
4643 // convert bgr to rgb and unpremultiply
4644 for (i=0; i < pixel_count; ++i) {
4645 stbi_uc a = p[3];
4646 stbi_uc t = p[0];
4647 if (a) {
4648 p[0] = p[2] * 255 / a;
4649 p[1] = p[1] * 255 / a;
4650 p[2] = t * 255 / a;
4651 } else {
4652 p[0] = p[2];
4653 p[2] = t;
4654 }
4655 p += 4;
4656 }
4657 } else {
4658 // convert bgr to rgb
4659 for (i=0; i < pixel_count; ++i) {
4660 stbi_uc t = p[0];
4661 p[0] = p[2];
4662 p[2] = t;
4663 p += 4;
4664 }
4665 }
4666 }
4667 }
4668
4669 #define STBI__PNG_TYPE(a,b,c,d) (((a) << 24) + ((b) << 16) + ((c) << 8) + (d))
4670
4671 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
4672 {
4673 stbi_uc palette[1024], pal_img_n=0;
4674 stbi_uc has_trans=0, tc[3];
4675 stbi__uint16 tc16[3];
4676 stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
4677 int first=1,k,interlace=0, color=0, is_iphone=0;
4678 stbi__context *s = z->s;
4679
4680 z->expanded = NULL;
4681 z->idata = NULL;
4682 z->out = NULL;
4683
4684 if (!stbi__check_png_header(s)) return 0;
4685
4686 if (scan == STBI__SCAN_type) return 1;
4687
4688 for (;;) {
4689 stbi__pngchunk c = stbi__get_chunk_header(s);
4690 switch (c.type) {
4691 case STBI__PNG_TYPE('C','g','B','I'):
4692 is_iphone = 1;
4693 stbi__skip(s, c.length);
4694 break;
4695 case STBI__PNG_TYPE('I','H','D','R'): {
4696 int comp,filter;
4697 if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
4698 first = 0;
4699 if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
4700 s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4701 s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)");
4702 z->depth = stbi__get8(s); if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16) return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
4703 color = stbi__get8(s); if (color > 6) return stbi__err("bad ctype","Corrupt PNG");
4704 if (color == 3 && z->depth == 16) return stbi__err("bad ctype","Corrupt PNG");
4705 if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
4706 comp = stbi__get8(s); if (comp) return stbi__err("bad comp method","Corrupt PNG");
4707 filter= stbi__get8(s); if (filter) return stbi__err("bad filter method","Corrupt PNG");
4708 interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
4709 if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
4710 if (!pal_img_n) {
4711 s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
4712 if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
4713 if (scan == STBI__SCAN_header) return 1;
4714 } else {
4715 // if paletted, then pal_n is our final components, and
4716 // img_n is # components to decompress/filter.
4717 s->img_n = 1;
4718 if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
4719 // if SCAN_header, have to scan to see if we have a tRNS
4720 }
4721 break;
4722 }
4723
4724 case STBI__PNG_TYPE('P','L','T','E'): {
4725 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4726 if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
4727 pal_len = c.length / 3;
4728 if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
4729 for (i=0; i < pal_len; ++i) {
4730 palette[i*4+0] = stbi__get8(s);
4731 palette[i*4+1] = stbi__get8(s);
4732 palette[i*4+2] = stbi__get8(s);
4733 palette[i*4+3] = 255;
4734 }
4735 break;
4736 }
4737
4738 case STBI__PNG_TYPE('t','R','N','S'): {
4739 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4740 if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
4741 if (pal_img_n) {
4742 if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
4743 if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
4744 if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
4745 pal_img_n = 4;
4746 for (i=0; i < c.length; ++i)
4747 palette[i*4+3] = stbi__get8(s);
4748 } else {
4749 if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
4750 if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
4751 has_trans = 1;
4752 if (z->depth == 16) {
4753 for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
4754 } else {
4755 for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
4756 }
4757 }
4758 break;
4759 }
4760
4761 case STBI__PNG_TYPE('I','D','A','T'): {
4762 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4763 if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
4764 if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
4765 if ((int)(ioff + c.length) < (int)ioff) return 0;
4766 if (ioff + c.length > idata_limit) {
4767 stbi__uint32 idata_limit_old = idata_limit;
4768 stbi_uc *p;
4769 if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
4770 while (ioff + c.length > idata_limit)
4771 idata_limit *= 2;
4772 STBI_NOTUSED(idata_limit_old);
4773 p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
4774 z->idata = p;
4775 }
4776 if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
4777 ioff += c.length;
4778 break;
4779 }
4780
4781 case STBI__PNG_TYPE('I','E','N','D'): {
4782 stbi__uint32 raw_len, bpl;
4783 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4784 if (scan != STBI__SCAN_load) return 1;
4785 if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
4786 // initial guess for decoded data size to avoid unnecessary reallocs
4787 bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
4788 raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
4789 z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
4790 if (z->expanded == NULL) return 0; // zlib should set error
4791 STBI_FREE(z->idata); z->idata = NULL;
4792 if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
4793 s->img_out_n = s->img_n+1;
4794 else
4795 s->img_out_n = s->img_n;
4796 if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
4797 if (has_trans) {
4798 if (z->depth == 16) {
4799 if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
4800 } else {
4801 if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
4802 }
4803 }
4804 if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
4805 stbi__de_iphone(z);
4806 if (pal_img_n) {
4807 // pal_img_n == 3 or 4
4808 s->img_n = pal_img_n; // record the actual colors we had
4809 s->img_out_n = pal_img_n;
4810 if (req_comp >= 3) s->img_out_n = req_comp;
4811 if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
4812 return 0;
4813 }
4814 STBI_FREE(z->expanded); z->expanded = NULL;
4815 return 1;
4816 }
4817
4818 default:
4819 // if critical, fail
4820 if (first) return stbi__err("first not IHDR", "Corrupt PNG");
4821 if ((c.type & (1 << 29)) == 0) {
4822 #ifndef STBI_NO_FAILURE_STRINGS
4823 // not threadsafe
4824 static char invalid_chunk[] = "XXXX PNG chunk not known";
4825 invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
4826 invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
4827 invalid_chunk[2] = STBI__BYTECAST(c.type >> 8);
4828 invalid_chunk[3] = STBI__BYTECAST(c.type >> 0);
4829 #endif
4830 return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
4831 }
4832 stbi__skip(s, c.length);
4833 break;
4834 }
4835 // end of PNG chunk, read and skip CRC
4836 stbi__get32be(s);
4837 }
4838 }
4839
4840 static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
4841 {
4842 void *result=NULL;
4843 if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
4844 if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
4845 if (p->depth < 8)
4846 ri->bits_per_channel = 8;
4847 else
4848 ri->bits_per_channel = p->depth;
4849 result = p->out;
4850 p->out = NULL;
4851 if (req_comp && req_comp != p->s->img_out_n) {
4852 if (ri->bits_per_channel == 8)
4853 result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4854 else
4855 result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
4856 p->s->img_out_n = req_comp;
4857 if (result == NULL) return result;
4858 }
4859 *x = p->s->img_x;
4860 *y = p->s->img_y;
4861 if (n) *n = p->s->img_n;
4862 }
4863 STBI_FREE(p->out); p->out = NULL;
4864 STBI_FREE(p->expanded); p->expanded = NULL;
4865 STBI_FREE(p->idata); p->idata = NULL;
4866
4867 return result;
4868 }
4869
4870 static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
4871 {
4872 stbi__png p;
4873 p.s = s;
4874 return stbi__do_png(&p, x,y,comp,req_comp, ri);
4875 }
4876
4877 static int stbi__png_test(stbi__context *s)
4878 {
4879 int r;
4880 r = stbi__check_png_header(s);
4881 stbi__rewind(s);
4882 return r;
4883 }
4884
4885 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
4886 {
4887 if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
4888 stbi__rewind( p->s );
4889 return 0;
4890 }
4891 if (x) *x = p->s->img_x;
4892 if (y) *y = p->s->img_y;
4893 if (comp) *comp = p->s->img_n;
4894 return 1;
4895 }
4896
4897 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
4898 {
4899 stbi__png p;
4900 p.s = s;
4901 return stbi__png_info_raw(&p, x, y, comp);
4902 }
4903 #endif
4904
4905 // Microsoft/Windows BMP image
4906
4907 #ifndef STBI_NO_BMP
4908 static int stbi__bmp_test_raw(stbi__context *s)
4909 {
4910 int r;
4911 int sz;
4912 if (stbi__get8(s) != 'B') return 0;
4913 if (stbi__get8(s) != 'M') return 0;
4914 stbi__get32le(s); // discard filesize
4915 stbi__get16le(s); // discard reserved
4916 stbi__get16le(s); // discard reserved
4917 stbi__get32le(s); // discard data offset
4918 sz = stbi__get32le(s);
4919 r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
4920 return r;
4921 }
4922
4923 static int stbi__bmp_test(stbi__context *s)
4924 {
4925 int r = stbi__bmp_test_raw(s);
4926 stbi__rewind(s);
4927 return r;
4928 }
4929
4930
4931 // returns 0..31 for the highest set bit
4932 static int stbi__high_bit(unsigned int z)
4933 {
4934 int n=0;
4935 if (z == 0) return -1;
4936 if (z >= 0x10000) n += 16, z >>= 16;
4937 if (z >= 0x00100) n += 8, z >>= 8;
4938 if (z >= 0x00010) n += 4, z >>= 4;
4939 if (z >= 0x00004) n += 2, z >>= 2;
4940 if (z >= 0x00002) n += 1, z >>= 1;
4941 return n;
4942 }
4943
4944 static int stbi__bitcount(unsigned int a)
4945 {
4946 a = (a & 0x55555555) + ((a >> 1) & 0x55555555); // max 2
4947 a = (a & 0x33333333) + ((a >> 2) & 0x33333333); // max 4
4948 a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
4949 a = (a + (a >> 8)); // max 16 per 8 bits
4950 a = (a + (a >> 16)); // max 32 per 8 bits
4951 return a & 0xff;
4952 }
4953
4954 static int stbi__shiftsigned(int v, int shift, int bits)
4955 {
4956 int result;
4957 int z=0;
4958
4959 if (shift < 0) v <<= -shift;
4960 else v >>= shift;
4961 result = v;
4962
4963 z = bits;
4964 while (z < 8) {
4965 result += v >> z;
4966 z += bits;
4967 }
4968 return result;
4969 }
4970
4971 typedef struct
4972 {
4973 int bpp, offset, hsz;
4974 unsigned int mr,mg,mb,ma, all_a;
4975 } stbi__bmp_data;
4976
4977 static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
4978 {
4979 int hsz;
4980 if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
4981 stbi__get32le(s); // discard filesize
4982 stbi__get16le(s); // discard reserved
4983 stbi__get16le(s); // discard reserved
4984 info->offset = stbi__get32le(s);
4985 info->hsz = hsz = stbi__get32le(s);
4986 info->mr = info->mg = info->mb = info->ma = 0;
4987
4988 if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
4989 if (hsz == 12) {
4990 s->img_x = stbi__get16le(s);
4991 s->img_y = stbi__get16le(s);
4992 } else {
4993 s->img_x = stbi__get32le(s);
4994 s->img_y = stbi__get32le(s);
4995 }
4996 if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
4997 info->bpp = stbi__get16le(s);
4998 if (info->bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit");
4999 if (hsz != 12) {
5000 int compress = stbi__get32le(s);
5001 if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
5002 stbi__get32le(s); // discard sizeof
5003 stbi__get32le(s); // discard hres
5004 stbi__get32le(s); // discard vres
5005 stbi__get32le(s); // discard colorsused
5006 stbi__get32le(s); // discard max important
5007 if (hsz == 40 || hsz == 56) {
5008 if (hsz == 56) {
5009 stbi__get32le(s);
5010 stbi__get32le(s);
5011 stbi__get32le(s);
5012 stbi__get32le(s);
5013 }
5014 if (info->bpp == 16 || info->bpp == 32) {
5015 if (compress == 0) {
5016 if (info->bpp == 32) {
5017 info->mr = 0xffu << 16;
5018 info->mg = 0xffu << 8;
5019 info->mb = 0xffu << 0;
5020 info->ma = 0xffu << 24;
5021 info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
5022 } else {
5023 info->mr = 31u << 10;
5024 info->mg = 31u << 5;
5025 info->mb = 31u << 0;
5026 }
5027 } else if (compress == 3) {
5028 info->mr = stbi__get32le(s);
5029 info->mg = stbi__get32le(s);
5030 info->mb = stbi__get32le(s);
5031 // not documented, but generated by photoshop and handled by mspaint
5032 if (info->mr == info->mg && info->mg == info->mb) {
5033 // ?!?!?
5034 return stbi__errpuc("bad BMP", "bad BMP");
5035 }
5036 } else
5037 return stbi__errpuc("bad BMP", "bad BMP");
5038 }
5039 } else {
5040 int i;
5041 if (hsz != 108 && hsz != 124)
5042 return stbi__errpuc("bad BMP", "bad BMP");
5043 info->mr = stbi__get32le(s);
5044 info->mg = stbi__get32le(s);
5045 info->mb = stbi__get32le(s);
5046 info->ma = stbi__get32le(s);
5047 stbi__get32le(s); // discard color space
5048 for (i=0; i < 12; ++i)
5049 stbi__get32le(s); // discard color space parameters
5050 if (hsz == 124) {
5051 stbi__get32le(s); // discard rendering intent
5052 stbi__get32le(s); // discard offset of profile data
5053 stbi__get32le(s); // discard size of profile data
5054 stbi__get32le(s); // discard reserved
5055 }
5056 }
5057 }
5058 return (void *) 1;
5059 }
5060
5061
5062 static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5063 {
5064 stbi_uc *out;
5065 unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
5066 stbi_uc pal[256][4];
5067 int psize=0,i,j,width;
5068 int flip_vertically, pad, target;
5069 stbi__bmp_data info;
5070 STBI_NOTUSED(ri);
5071
5072 info.all_a = 255;
5073 if (stbi__bmp_parse_header(s, &info) == NULL)
5074 return NULL; // error code already set
5075
5076 flip_vertically = ((int) s->img_y) > 0;
5077 s->img_y = abs((int) s->img_y);
5078
5079 mr = info.mr;
5080 mg = info.mg;
5081 mb = info.mb;
5082 ma = info.ma;
5083 all_a = info.all_a;
5084
5085 if (info.hsz == 12) {
5086 if (info.bpp < 24)
5087 psize = (info.offset - 14 - 24) / 3;
5088 } else {
5089 if (info.bpp < 16)
5090 psize = (info.offset - 14 - info.hsz) >> 2;
5091 }
5092
5093 s->img_n = ma ? 4 : 3;
5094 if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
5095 target = req_comp;
5096 else
5097 target = s->img_n; // if they want monochrome, we'll post-convert
5098
5099 // sanity-check size
5100 if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
5101 return stbi__errpuc("too large", "Corrupt BMP");
5102
5103 out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
5104 if (!out) return stbi__errpuc("outofmem", "Out of memory");
5105 if (info.bpp < 16) {
5106 int z=0;
5107 if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
5108 for (i=0; i < psize; ++i) {
5109 pal[i][2] = stbi__get8(s);
5110 pal[i][1] = stbi__get8(s);
5111 pal[i][0] = stbi__get8(s);
5112 if (info.hsz != 12) stbi__get8(s);
5113 pal[i][3] = 255;
5114 }
5115 stbi__skip(s, info.offset - 14 - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
5116 if (info.bpp == 4) width = (s->img_x + 1) >> 1;
5117 else if (info.bpp == 8) width = s->img_x;
5118 else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
5119 pad = (-width)&3;
5120 for (j=0; j < (int) s->img_y; ++j) {
5121 for (i=0; i < (int) s->img_x; i += 2) {
5122 int v=stbi__get8(s),v2=0;
5123 if (info.bpp == 4) {
5124 v2 = v & 15;
5125 v >>= 4;
5126 }
5127 out[z++] = pal[v][0];
5128 out[z++] = pal[v][1];
5129 out[z++] = pal[v][2];
5130 if (target == 4) out[z++] = 255;
5131 if (i+1 == (int) s->img_x) break;
5132 v = (info.bpp == 8) ? stbi__get8(s) : v2;
5133 out[z++] = pal[v][0];
5134 out[z++] = pal[v][1];
5135 out[z++] = pal[v][2];
5136 if (target == 4) out[z++] = 255;
5137 }
5138 stbi__skip(s, pad);
5139 }
5140 } else {
5141 int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
5142 int z = 0;
5143 int easy=0;
5144 stbi__skip(s, info.offset - 14 - info.hsz);
5145 if (info.bpp == 24) width = 3 * s->img_x;
5146 else if (info.bpp == 16) width = 2*s->img_x;
5147 else /* bpp = 32 and pad = 0 */ width=0;
5148 pad = (-width) & 3;
5149 if (info.bpp == 24) {
5150 easy = 1;
5151 } else if (info.bpp == 32) {
5152 if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
5153 easy = 2;
5154 }
5155 if (!easy) {
5156 if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
5157 // right shift amt to put high bit in position #7
5158 rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
5159 gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
5160 bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
5161 ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
5162 }
5163 for (j=0; j < (int) s->img_y; ++j) {
5164 if (easy) {
5165 for (i=0; i < (int) s->img_x; ++i) {
5166 unsigned char a;
5167 out[z+2] = stbi__get8(s);
5168 out[z+1] = stbi__get8(s);
5169 out[z+0] = stbi__get8(s);
5170 z += 3;
5171 a = (easy == 2 ? stbi__get8(s) : 255);
5172 all_a |= a;
5173 if (target == 4) out[z++] = a;
5174 }
5175 } else {
5176 int bpp = info.bpp;
5177 for (i=0; i < (int) s->img_x; ++i) {
5178 stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
5179 int a;
5180 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
5181 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
5182 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
5183 a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
5184 all_a |= a;
5185 if (target == 4) out[z++] = STBI__BYTECAST(a);
5186 }
5187 }
5188 stbi__skip(s, pad);
5189 }
5190 }
5191
5192 // if alpha channel is all 0s, replace with all 255s
5193 if (target == 4 && all_a == 0)
5194 for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
5195 out[i] = 255;
5196
5197 if (flip_vertically) {
5198 stbi_uc t;
5199 for (j=0; j < (int) s->img_y>>1; ++j) {
5200 stbi_uc *p1 = out + j *s->img_x*target;
5201 stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
5202 for (i=0; i < (int) s->img_x*target; ++i) {
5203 t = p1[i], p1[i] = p2[i], p2[i] = t;
5204 }
5205 }
5206 }
5207
5208 if (req_comp && req_comp != target) {
5209 out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
5210 if (out == NULL) return out; // stbi__convert_format frees input on failure
5211 }
5212
5213 *x = s->img_x;
5214 *y = s->img_y;
5215 if (comp) *comp = s->img_n;
5216 return out;
5217 }
5218 #endif
5219
5220 // Targa Truevision - TGA
5221 // by Jonathan Dummer
5222 #ifndef STBI_NO_TGA
5223 // returns STBI_rgb or whatever, 0 on error
5224 static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
5225 {
5226 // only RGB or RGBA (incl. 16bit) or grey allowed
5227 if(is_rgb16) *is_rgb16 = 0;
5228 switch(bits_per_pixel) {
5229 case 8: return STBI_grey;
5230 case 16: if(is_grey) return STBI_grey_alpha;
5231 // else: fall-through
5232 case 15: if(is_rgb16) *is_rgb16 = 1;
5233 return STBI_rgb;
5234 case 24: // fall-through
5235 case 32: return bits_per_pixel/8;
5236 default: return 0;
5237 }
5238 }
5239
5240 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
5241 {
5242 int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
5243 int sz, tga_colormap_type;
5244 stbi__get8(s); // discard Offset
5245 tga_colormap_type = stbi__get8(s); // colormap type
5246 if( tga_colormap_type > 1 ) {
5247 stbi__rewind(s);
5248 return 0; // only RGB or indexed allowed
5249 }
5250 tga_image_type = stbi__get8(s); // image type
5251 if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
5252 if (tga_image_type != 1 && tga_image_type != 9) {
5253 stbi__rewind(s);
5254 return 0;
5255 }
5256 stbi__skip(s,4); // skip index of first colormap entry and number of entries
5257 sz = stbi__get8(s); // check bits per palette color entry
5258 if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
5259 stbi__rewind(s);
5260 return 0;
5261 }
5262 stbi__skip(s,4); // skip image x and y origin
5263 tga_colormap_bpp = sz;
5264 } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
5265 if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
5266 stbi__rewind(s);
5267 return 0; // only RGB or grey allowed, +/- RLE
5268 }
5269 stbi__skip(s,9); // skip colormap specification and image x/y origin
5270 tga_colormap_bpp = 0;
5271 }
5272 tga_w = stbi__get16le(s);
5273 if( tga_w < 1 ) {
5274 stbi__rewind(s);
5275 return 0; // test width
5276 }
5277 tga_h = stbi__get16le(s);
5278 if( tga_h < 1 ) {
5279 stbi__rewind(s);
5280 return 0; // test height
5281 }
5282 tga_bits_per_pixel = stbi__get8(s); // bits per pixel
5283 stbi__get8(s); // ignore alpha bits
5284 if (tga_colormap_bpp != 0) {
5285 if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
5286 // when using a colormap, tga_bits_per_pixel is the size of the indexes
5287 // I don't think anything but 8 or 16bit indexes makes sense
5288 stbi__rewind(s);
5289 return 0;
5290 }
5291 tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
5292 } else {
5293 tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
5294 }
5295 if(!tga_comp) {
5296 stbi__rewind(s);
5297 return 0;
5298 }
5299 if (x) *x = tga_w;
5300 if (y) *y = tga_h;
5301 if (comp) *comp = tga_comp;
5302 return 1; // seems to have passed everything
5303 }
5304
5305 static int stbi__tga_test(stbi__context *s)
5306 {
5307 int res = 0;
5308 int sz, tga_color_type;
5309 stbi__get8(s); // discard Offset
5310 tga_color_type = stbi__get8(s); // color type
5311 if ( tga_color_type > 1 ) goto errorEnd; // only RGB or indexed allowed
5312 sz = stbi__get8(s); // image type
5313 if ( tga_color_type == 1 ) { // colormapped (paletted) image
5314 if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
5315 stbi__skip(s,4); // skip index of first colormap entry and number of entries
5316 sz = stbi__get8(s); // check bits per palette color entry
5317 if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5318 stbi__skip(s,4); // skip image x and y origin
5319 } else { // "normal" image w/o colormap
5320 if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
5321 stbi__skip(s,9); // skip colormap specification and image x/y origin
5322 }
5323 if ( stbi__get16le(s) < 1 ) goto errorEnd; // test width
5324 if ( stbi__get16le(s) < 1 ) goto errorEnd; // test height
5325 sz = stbi__get8(s); // bits per pixel
5326 if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
5327 if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
5328
5329 res = 1; // if we got this far, everything's good and we can return 1 instead of 0
5330
5331 errorEnd:
5332 stbi__rewind(s);
5333 return res;
5334 }
5335
5336 // read 16bit value and convert to 24bit RGB
5337 static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
5338 {
5339 stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
5340 stbi__uint16 fiveBitMask = 31;
5341 // we have 3 channels with 5bits each
5342 int r = (px >> 10) & fiveBitMask;
5343 int g = (px >> 5) & fiveBitMask;
5344 int b = px & fiveBitMask;
5345 // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
5346 out[0] = (stbi_uc)((r * 255)/31);
5347 out[1] = (stbi_uc)((g * 255)/31);
5348 out[2] = (stbi_uc)((b * 255)/31);
5349
5350 // some people claim that the most significant bit might be used for alpha
5351 // (possibly if an alpha-bit is set in the "image descriptor byte")
5352 // but that only made 16bit test images completely translucent..
5353 // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
5354 }
5355
5356 static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
5357 {
5358 // read in the TGA header stuff
5359 int tga_offset = stbi__get8(s);
5360 int tga_indexed = stbi__get8(s);
5361 int tga_image_type = stbi__get8(s);
5362 int tga_is_RLE = 0;
5363 int tga_palette_start = stbi__get16le(s);
5364 int tga_palette_len = stbi__get16le(s);
5365 int tga_palette_bits = stbi__get8(s);
5366 int tga_x_origin = stbi__get16le(s);
5367 int tga_y_origin = stbi__get16le(s);
5368 int tga_width = stbi__get16le(s);
5369 int tga_height = stbi__get16le(s);
5370 int tga_bits_per_pixel = stbi__get8(s);
5371 int tga_comp, tga_rgb16=0;
5372 int tga_inverted = stbi__get8(s);
5373 // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
5374 // image data
5375 unsigned char *tga_data;
5376 unsigned char *tga_palette = NULL;
5377 int i, j;
5378 unsigned char raw_data[4] = {0};
5379 int RLE_count = 0;
5380 int RLE_repeating = 0;
5381 int read_next_pixel = 1;
5382 STBI_NOTUSED(ri);
5383
5384 // do a tiny bit of precessing
5385 if ( tga_image_type >= 8 )
5386 {
5387 tga_image_type -= 8;
5388 tga_is_RLE = 1;
5389 }
5390 tga_inverted = 1 - ((tga_inverted >> 5) & 1);
5391
5392 // If I'm paletted, then I'll use the number of bits from the palette
5393 if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
5394 else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
5395
5396 if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
5397 return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
5398
5399 // tga info
5400 *x = tga_width;
5401 *y = tga_height;
5402 if (comp) *comp = tga_comp;
5403
5404 if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
5405 return stbi__errpuc("too large", "Corrupt TGA");
5406
5407 tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
5408 if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
5409
5410 // skip to the data's starting position (offset usually = 0)
5411 stbi__skip(s, tga_offset );
5412
5413 if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
5414 for (i=0; i < tga_height; ++i) {
5415 int row = tga_inverted ? tga_height -i - 1 : i;
5416 stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
5417 stbi__getn(s, tga_row, tga_width * tga_comp);
5418 }
5419 } else {
5420 // do I need to load a palette?
5421 if ( tga_indexed)
5422 {
5423 // any data to skip? (offset usually = 0)
5424 stbi__skip(s, tga_palette_start );
5425 // load the palette
5426 tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
5427 if (!tga_palette) {
5428 STBI_FREE(tga_data);
5429 return stbi__errpuc("outofmem", "Out of memory");
5430 }
5431 if (tga_rgb16) {
5432 stbi_uc *pal_entry = tga_palette;
5433 STBI_ASSERT(tga_comp == STBI_rgb);
5434 for (i=0; i < tga_palette_len; ++i) {
5435 stbi__tga_read_rgb16(s, pal_entry);
5436 pal_entry += tga_comp;
5437 }
5438 } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
5439 STBI_FREE(tga_data);
5440 STBI_FREE(tga_palette);
5441 return stbi__errpuc("bad palette", "Corrupt TGA");
5442 }
5443 }
5444 // load the data
5445 for (i=0; i < tga_width * tga_height; ++i)
5446 {
5447 // if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
5448 if ( tga_is_RLE )
5449 {
5450 if ( RLE_count == 0 )
5451 {
5452 // yep, get the next byte as a RLE command
5453 int RLE_cmd = stbi__get8(s);
5454 RLE_count = 1 + (RLE_cmd & 127);
5455 RLE_repeating = RLE_cmd >> 7;
5456 read_next_pixel = 1;
5457 } else if ( !RLE_repeating )
5458 {
5459 read_next_pixel = 1;
5460 }
5461 } else
5462 {
5463 read_next_pixel = 1;
5464 }
5465 // OK, if I need to read a pixel, do it now
5466 if ( read_next_pixel )
5467 {
5468 // load however much data we did have
5469 if ( tga_indexed )
5470 {
5471 // read in index, then perform the lookup
5472 int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
5473 if ( pal_idx >= tga_palette_len ) {
5474 // invalid index
5475 pal_idx = 0;
5476 }
5477 pal_idx *= tga_comp;
5478 for (j = 0; j < tga_comp; ++j) {
5479 raw_data[j] = tga_palette[pal_idx+j];
5480 }
5481 } else if(tga_rgb16) {
5482 STBI_ASSERT(tga_comp == STBI_rgb);
5483 stbi__tga_read_rgb16(s, raw_data);
5484 } else {
5485 // read in the data raw
5486 for (j = 0; j < tga_comp; ++j) {
5487 raw_data[j] = stbi__get8(s);
5488 }
5489 }
5490 // clear the reading flag for the next pixel
5491 read_next_pixel = 0;
5492 } // end of reading a pixel
5493
5494 // copy data
5495 for (j = 0; j < tga_comp; ++j)
5496 tga_data[i*tga_comp+j] = raw_data[j];
5497
5498 // in case we're in RLE mode, keep counting down
5499 --RLE_count;
5500 }
5501 // do I need to invert the image?
5502 if ( tga_inverted )
5503 {
5504 for (j = 0; j*2 < tga_height; ++j)
5505 {
5506 int index1 = j * tga_width * tga_comp;
5507 int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
5508 for (i = tga_width * tga_comp; i > 0; --i)
5509 {
5510 unsigned char temp = tga_data[index1];
5511 tga_data[index1] = tga_data[index2];
5512 tga_data[index2] = temp;
5513 ++index1;
5514 ++index2;
5515 }
5516 }
5517 }
5518 // clear my palette, if I had one
5519 if ( tga_palette != NULL )
5520 {
5521 STBI_FREE( tga_palette );
5522 }
5523 }
5524
5525 // swap RGB - if the source data was RGB16, it already is in the right order
5526 if (tga_comp >= 3 && !tga_rgb16)
5527 {
5528 unsigned char* tga_pixel = tga_data;
5529 for (i=0; i < tga_width * tga_height; ++i)
5530 {
5531 unsigned char temp = tga_pixel[0];
5532 tga_pixel[0] = tga_pixel[2];
5533 tga_pixel[2] = temp;
5534 tga_pixel += tga_comp;
5535 }
5536 }
5537
5538 // convert to target component count
5539 if (req_comp && req_comp != tga_comp)
5540 tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
5541
5542 // the things I do to get rid of an error message, and yet keep
5543 // Microsoft's C compilers happy... [8^(
5544 tga_palette_start = tga_palette_len = tga_palette_bits =
5545 tga_x_origin = tga_y_origin = 0;
5546 // OK, done
5547 return tga_data;
5548 }
5549 #endif
5550
5551 // *************************************************************************************************
5552 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
5553
5554 #ifndef STBI_NO_PSD
5555 static int stbi__psd_test(stbi__context *s)
5556 {
5557 int r = (stbi__get32be(s) == 0x38425053);
5558 stbi__rewind(s);
5559 return r;
5560 }
5561
5562 static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
5563 {
5564 int count, nleft, len;
5565
5566 count = 0;
5567 while ((nleft = pixelCount - count) > 0) {
5568 len = stbi__get8(s);
5569 if (len == 128) {
5570 // No-op.
5571 } else if (len < 128) {
5572 // Copy next len+1 bytes literally.
5573 len++;
5574 if (len > nleft) return 0; // corrupt data
5575 count += len;
5576 while (len) {
5577 *p = stbi__get8(s);
5578 p += 4;
5579 len--;
5580 }
5581 } else if (len > 128) {
5582 stbi_uc val;
5583 // Next -len+1 bytes in the dest are replicated from next source byte.
5584 // (Interpret len as a negative 8-bit int.)
5585 len = 257 - len;
5586 if (len > nleft) return 0; // corrupt data
5587 val = stbi__get8(s);
5588 count += len;
5589 while (len) {
5590 *p = val;
5591 p += 4;
5592 len--;
5593 }
5594 }
5595 }
5596
5597 return 1;
5598 }
5599
5600 static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
5601 {
5602 int pixelCount;
5603 int channelCount, compression;
5604 int channel, i;
5605 int bitdepth;
5606 int w,h;
5607 stbi_uc *out;
5608 STBI_NOTUSED(ri);
5609
5610 // Check identifier
5611 if (stbi__get32be(s) != 0x38425053) // "8BPS"
5612 return stbi__errpuc("not PSD", "Corrupt PSD image");
5613
5614 // Check file type version.
5615 if (stbi__get16be(s) != 1)
5616 return stbi__errpuc("wrong version", "Unsupported version of PSD image");
5617
5618 // Skip 6 reserved bytes.
5619 stbi__skip(s, 6 );
5620
5621 // Read the number of channels (R, G, B, A, etc).
5622 channelCount = stbi__get16be(s);
5623 if (channelCount < 0 || channelCount > 16)
5624 return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
5625
5626 // Read the rows and columns of the image.
5627 h = stbi__get32be(s);
5628 w = stbi__get32be(s);
5629
5630 // Make sure the depth is 8 bits.
5631 bitdepth = stbi__get16be(s);
5632 if (bitdepth != 8 && bitdepth != 16)
5633 return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
5634
5635 // Make sure the color mode is RGB.
5636 // Valid options are:
5637 // 0: Bitmap
5638 // 1: Grayscale
5639 // 2: Indexed color
5640 // 3: RGB color
5641 // 4: CMYK color
5642 // 7: Multichannel
5643 // 8: Duotone
5644 // 9: Lab color
5645 if (stbi__get16be(s) != 3)
5646 return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
5647
5648 // Skip the Mode Data. (It's the palette for indexed color; other info for other modes.)
5649 stbi__skip(s,stbi__get32be(s) );
5650
5651 // Skip the image resources. (resolution, pen tool paths, etc)
5652 stbi__skip(s, stbi__get32be(s) );
5653
5654 // Skip the reserved data.
5655 stbi__skip(s, stbi__get32be(s) );
5656
5657 // Find out if the data is compressed.
5658 // Known values:
5659 // 0: no compression
5660 // 1: RLE compressed
5661 compression = stbi__get16be(s);
5662 if (compression > 1)
5663 return stbi__errpuc("bad compression", "PSD has an unknown compression format");
5664
5665 // Check size
5666 if (!stbi__mad3sizes_valid(4, w, h, 0))
5667 return stbi__errpuc("too large", "Corrupt PSD");
5668
5669 // Create the destination image.
5670
5671 if (!compression && bitdepth == 16 && bpc == 16) {
5672 out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
5673 ri->bits_per_channel = 16;
5674 } else
5675 out = (stbi_uc *) stbi__malloc(4 * w*h);
5676
5677 if (!out) return stbi__errpuc("outofmem", "Out of memory");
5678 pixelCount = w*h;
5679
5680 // Initialize the data to zero.
5681 //memset( out, 0, pixelCount * 4 );
5682
5683 // Finally, the image data.
5684 if (compression) {
5685 // RLE as used by .PSD and .TIFF
5686 // Loop until you get the number of unpacked bytes you are expecting:
5687 // Read the next source byte into n.
5688 // If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
5689 // Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
5690 // Else if n is 128, noop.
5691 // Endloop
5692
5693 // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data,
5694 // which we're going to just skip.
5695 stbi__skip(s, h * channelCount * 2 );
5696
5697 // Read the RLE data by channel.
5698 for (channel = 0; channel < 4; channel++) {
5699 stbi_uc *p;
5700
5701 p = out+channel;
5702 if (channel >= channelCount) {
5703 // Fill this channel with default data.
5704 for (i = 0; i < pixelCount; i++, p += 4)
5705 *p = (channel == 3 ? 255 : 0);
5706 } else {
5707 // Read the RLE data.
5708 if (!stbi__psd_decode_rle(s, p, pixelCount)) {
5709 STBI_FREE(out);
5710 return stbi__errpuc("corrupt", "bad RLE data");
5711 }
5712 }
5713 }
5714
5715 } else {
5716 // We're at the raw image data. It's each channel in order (Red, Green, Blue, Alpha, ...)
5717 // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
5718
5719 // Read the data by channel.
5720 for (channel = 0; channel < 4; channel++) {
5721 if (channel >= channelCount) {
5722 // Fill this channel with default data.
5723 if (bitdepth == 16 && bpc == 16) {
5724 stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
5725 stbi__uint16 val = channel == 3 ? 65535 : 0;
5726 for (i = 0; i < pixelCount; i++, q += 4)
5727 *q = val;
5728 } else {
5729 stbi_uc *p = out+channel;
5730 stbi_uc val = channel == 3 ? 255 : 0;
5731 for (i = 0; i < pixelCount; i++, p += 4)
5732 *p = val;
5733 }
5734 } else {
5735 if (ri->bits_per_channel == 16) { // output bpc
5736 stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
5737 for (i = 0; i < pixelCount; i++, q += 4)
5738 *q = (stbi__uint16) stbi__get16be(s);
5739 } else {
5740 stbi_uc *p = out+channel;
5741 if (bitdepth == 16) { // input bpc
5742 for (i = 0; i < pixelCount; i++, p += 4)
5743 *p = (stbi_uc) (stbi__get16be(s) >> 8);
5744 } else {
5745 for (i = 0; i < pixelCount; i++, p += 4)
5746 *p = stbi__get8(s);
5747 }
5748 }
5749 }
5750 }
5751 }
5752
5753 // remove weird white matte from PSD
5754 if (channelCount >= 4) {
5755 if (ri->bits_per_channel == 16) {
5756 for (i=0; i < w*h; ++i) {
5757 stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
5758 if (pixel[3] != 0 && pixel[3] != 65535) {
5759 float a = pixel[3] / 65535.0f;
5760 float ra = 1.0f / a;
5761 float inv_a = 65535.0f * (1 - ra);
5762 pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
5763 pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
5764 pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
5765 }
5766 }
5767 } else {
5768 for (i=0; i < w*h; ++i) {
5769 unsigned char *pixel = out + 4*i;
5770 if (pixel[3] != 0 && pixel[3] != 255) {
5771 float a = pixel[3] / 255.0f;
5772 float ra = 1.0f / a;
5773 float inv_a = 255.0f * (1 - ra);
5774 pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
5775 pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
5776 pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
5777 }
5778 }
5779 }
5780 }
5781
5782 // convert to desired output format
5783 if (req_comp && req_comp != 4) {
5784 if (ri->bits_per_channel == 16)
5785 out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
5786 else
5787 out = stbi__convert_format(out, 4, req_comp, w, h);
5788 if (out == NULL) return out; // stbi__convert_format frees input on failure
5789 }
5790
5791 if (comp) *comp = 4;
5792 *y = h;
5793 *x = w;
5794
5795 return out;
5796 }
5797 #endif
5798
5799 // *************************************************************************************************
5800 // Softimage PIC loader
5801 // by Tom Seddon
5802 //
5803 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
5804 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
5805
5806 #ifndef STBI_NO_PIC
5807 static int stbi__pic_is4(stbi__context *s,const char *str)
5808 {
5809 int i;
5810 for (i=0; i<4; ++i)
5811 if (stbi__get8(s) != (stbi_uc)str[i])
5812 return 0;
5813
5814 return 1;
5815 }
5816
5817 static int stbi__pic_test_core(stbi__context *s)
5818 {
5819 int i;
5820
5821 if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
5822 return 0;
5823
5824 for(i=0;i<84;++i)
5825 stbi__get8(s);
5826
5827 if (!stbi__pic_is4(s,"PICT"))
5828 return 0;
5829
5830 return 1;
5831 }
5832
5833 typedef struct
5834 {
5835 stbi_uc size,type,channel;
5836 } stbi__pic_packet;
5837
5838 static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
5839 {
5840 int mask=0x80, i;
5841
5842 for (i=0; i<4; ++i, mask>>=1) {
5843 if (channel & mask) {
5844 if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
5845 dest[i]=stbi__get8(s);
5846 }
5847 }
5848
5849 return dest;
5850 }
5851
5852 static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
5853 {
5854 int mask=0x80,i;
5855
5856 for (i=0;i<4; ++i, mask>>=1)
5857 if (channel&mask)
5858 dest[i]=src[i];
5859 }
5860
5861 static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
5862 {
5863 int act_comp=0,num_packets=0,y,chained;
5864 stbi__pic_packet packets[10];
5865
5866 // this will (should...) cater for even some bizarre stuff like having data
5867 // for the same channel in multiple packets.
5868 do {
5869 stbi__pic_packet *packet;
5870
5871 if (num_packets==sizeof(packets)/sizeof(packets[0]))
5872 return stbi__errpuc("bad format","too many packets");
5873
5874 packet = &packets[num_packets++];
5875
5876 chained = stbi__get8(s);
5877 packet->size = stbi__get8(s);
5878 packet->type = stbi__get8(s);
5879 packet->channel = stbi__get8(s);
5880
5881 act_comp |= packet->channel;
5882
5883 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (reading packets)");
5884 if (packet->size != 8) return stbi__errpuc("bad format","packet isn't 8bpp");
5885 } while (chained);
5886
5887 *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
5888
5889 for(y=0; y<height; ++y) {
5890 int packet_idx;
5891
5892 for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
5893 stbi__pic_packet *packet = &packets[packet_idx];
5894 stbi_uc *dest = result+y*width*4;
5895
5896 switch (packet->type) {
5897 default:
5898 return stbi__errpuc("bad format","packet has bad compression type");
5899
5900 case 0: {//uncompressed
5901 int x;
5902
5903 for(x=0;x<width;++x, dest+=4)
5904 if (!stbi__readval(s,packet->channel,dest))
5905 return 0;
5906 break;
5907 }
5908
5909 case 1://Pure RLE
5910 {
5911 int left=width, i;
5912
5913 while (left>0) {
5914 stbi_uc count,value[4];
5915
5916 count=stbi__get8(s);
5917 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (pure read count)");
5918
5919 if (count > left)
5920 count = (stbi_uc) left;
5921
5922 if (!stbi__readval(s,packet->channel,value)) return 0;
5923
5924 for(i=0; i<count; ++i,dest+=4)
5925 stbi__copyval(packet->channel,dest,value);
5926 left -= count;
5927 }
5928 }
5929 break;
5930
5931 case 2: {//Mixed RLE
5932 int left=width;
5933 while (left>0) {
5934 int count = stbi__get8(s), i;
5935 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (mixed read count)");
5936
5937 if (count >= 128) { // Repeated
5938 stbi_uc value[4];
5939
5940 if (count==128)
5941 count = stbi__get16be(s);
5942 else
5943 count -= 127;
5944 if (count > left)
5945 return stbi__errpuc("bad file","scanline overrun");
5946
5947 if (!stbi__readval(s,packet->channel,value))
5948 return 0;
5949
5950 for(i=0;i<count;++i, dest += 4)
5951 stbi__copyval(packet->channel,dest,value);
5952 } else { // Raw
5953 ++count;
5954 if (count>left) return stbi__errpuc("bad file","scanline overrun");
5955
5956 for(i=0;i<count;++i, dest+=4)
5957 if (!stbi__readval(s,packet->channel,dest))
5958 return 0;
5959 }
5960 left-=count;
5961 }
5962 break;
5963 }
5964 }
5965 }
5966 }
5967
5968 return result;
5969 }
5970
5971 static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
5972 {
5973 stbi_uc *result;
5974 int i, x,y;
5975 STBI_NOTUSED(ri);
5976
5977 for (i=0; i<92; ++i)
5978 stbi__get8(s);
5979
5980 x = stbi__get16be(s);
5981 y = stbi__get16be(s);
5982 if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (pic header)");
5983 if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
5984
5985 stbi__get32be(s); //skip `ratio'
5986 stbi__get16be(s); //skip `fields'
5987 stbi__get16be(s); //skip `pad'
5988
5989 // intermediate buffer is RGBA
5990 result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
5991 memset(result, 0xff, x*y*4);
5992
5993 if (!stbi__pic_load_core(s,x,y,comp, result)) {
5994 STBI_FREE(result);
5995 result=0;
5996 }
5997 *px = x;
5998 *py = y;
5999 if (req_comp == 0) req_comp = *comp;
6000 result=stbi__convert_format(result,4,req_comp,x,y);
6001
6002 return result;
6003 }
6004
6005 static int stbi__pic_test(stbi__context *s)
6006 {
6007 int r = stbi__pic_test_core(s);
6008 stbi__rewind(s);
6009 return r;
6010 }
6011 #endif
6012
6013 // *************************************************************************************************
6014 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
6015
6016 #ifndef STBI_NO_GIF
6017 typedef struct
6018 {
6019 stbi__int16 prefix;
6020 stbi_uc first;
6021 stbi_uc suffix;
6022 } stbi__gif_lzw;
6023
6024 typedef struct
6025 {
6026 int w,h;
6027 stbi_uc *out, *old_out; // output buffer (always 4 components)
6028 int flags, bgindex, ratio, transparent, eflags, delay;
6029 stbi_uc pal[256][4];
6030 stbi_uc lpal[256][4];
6031 stbi__gif_lzw codes[4096];
6032 stbi_uc *color_table;
6033 int parse, step;
6034 int lflags;
6035 int start_x, start_y;
6036 int max_x, max_y;
6037 int cur_x, cur_y;
6038 int line_size;
6039 } stbi__gif;
6040
6041 static int stbi__gif_test_raw(stbi__context *s)
6042 {
6043 int sz;
6044 if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
6045 sz = stbi__get8(s);
6046 if (sz != '9' && sz != '7') return 0;
6047 if (stbi__get8(s) != 'a') return 0;
6048 return 1;
6049 }
6050
6051 static int stbi__gif_test(stbi__context *s)
6052 {
6053 int r = stbi__gif_test_raw(s);
6054 stbi__rewind(s);
6055 return r;
6056 }
6057
6058 static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
6059 {
6060 int i;
6061 for (i=0; i < num_entries; ++i) {
6062 pal[i][2] = stbi__get8(s);
6063 pal[i][1] = stbi__get8(s);
6064 pal[i][0] = stbi__get8(s);
6065 pal[i][3] = transp == i ? 0 : 255;
6066 }
6067 }
6068
6069 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
6070 {
6071 stbi_uc version;
6072 if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
6073 return stbi__err("not GIF", "Corrupt GIF");
6074
6075 version = stbi__get8(s);
6076 if (version != '7' && version != '9') return stbi__err("not GIF", "Corrupt GIF");
6077 if (stbi__get8(s) != 'a') return stbi__err("not GIF", "Corrupt GIF");
6078
6079 stbi__g_failure_reason = "";
6080 g->w = stbi__get16le(s);
6081 g->h = stbi__get16le(s);
6082 g->flags = stbi__get8(s);
6083 g->bgindex = stbi__get8(s);
6084 g->ratio = stbi__get8(s);
6085 g->transparent = -1;
6086
6087 if (comp != 0) *comp = 4; // can't actually tell whether it's 3 or 4 until we parse the comments
6088
6089 if (is_info) return 1;
6090
6091 if (g->flags & 0x80)
6092 stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
6093
6094 return 1;
6095 }
6096
6097 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
6098 {
6099 stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
6100 if (!stbi__gif_header(s, g, comp, 1)) {
6101 STBI_FREE(g);
6102 stbi__rewind( s );
6103 return 0;
6104 }
6105 if (x) *x = g->w;
6106 if (y) *y = g->h;
6107 STBI_FREE(g);
6108 return 1;
6109 }
6110
6111 static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
6112 {
6113 stbi_uc *p, *c;
6114
6115 // recurse to decode the prefixes, since the linked-list is backwards,
6116 // and working backwards through an interleaved image would be nasty
6117 if (g->codes[code].prefix >= 0)
6118 stbi__out_gif_code(g, g->codes[code].prefix);
6119
6120 if (g->cur_y >= g->max_y) return;
6121
6122 p = &g->out[g->cur_x + g->cur_y];
6123 c = &g->color_table[g->codes[code].suffix * 4];
6124
6125 if (c[3] >= 128) {
6126 p[0] = c[2];
6127 p[1] = c[1];
6128 p[2] = c[0];
6129 p[3] = c[3];
6130 }
6131 g->cur_x += 4;
6132
6133 if (g->cur_x >= g->max_x) {
6134 g->cur_x = g->start_x;
6135 g->cur_y += g->step;
6136
6137 while (g->cur_y >= g->max_y && g->parse > 0) {
6138 g->step = (1 << g->parse) * g->line_size;
6139 g->cur_y = g->start_y + (g->step >> 1);
6140 --g->parse;
6141 }
6142 }
6143 }
6144
6145 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
6146 {
6147 stbi_uc lzw_cs;
6148 stbi__int32 len, init_code;
6149 stbi__uint32 first;
6150 stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
6151 stbi__gif_lzw *p;
6152
6153 lzw_cs = stbi__get8(s);
6154 if (lzw_cs > 12) return NULL;
6155 clear = 1 << lzw_cs;
6156 first = 1;
6157 codesize = lzw_cs + 1;
6158 codemask = (1 << codesize) - 1;
6159 bits = 0;
6160 valid_bits = 0;
6161 for (init_code = 0; init_code < clear; init_code++) {
6162 g->codes[init_code].prefix = -1;
6163 g->codes[init_code].first = (stbi_uc) init_code;
6164 g->codes[init_code].suffix = (stbi_uc) init_code;
6165 }
6166
6167 // support no starting clear code
6168 avail = clear+2;
6169 oldcode = -1;
6170
6171 len = 0;
6172 for(;;) {
6173 if (valid_bits < codesize) {
6174 if (len == 0) {
6175 len = stbi__get8(s); // start new block
6176 if (len == 0)
6177 return g->out;
6178 }
6179 --len;
6180 bits |= (stbi__int32) stbi__get8(s) << valid_bits;
6181 valid_bits += 8;
6182 } else {
6183 stbi__int32 code = bits & codemask;
6184 bits >>= codesize;
6185 valid_bits -= codesize;
6186 // @OPTIMIZE: is there some way we can accelerate the non-clear path?
6187 if (code == clear) { // clear code
6188 codesize = lzw_cs + 1;
6189 codemask = (1 << codesize) - 1;
6190 avail = clear + 2;
6191 oldcode = -1;
6192 first = 0;
6193 } else if (code == clear + 1) { // end of stream code
6194 stbi__skip(s, len);
6195 while ((len = stbi__get8(s)) > 0)
6196 stbi__skip(s,len);
6197 return g->out;
6198 } else if (code <= avail) {
6199 if (first) return stbi__errpuc("no clear code", "Corrupt GIF");
6200
6201 if (oldcode >= 0) {
6202 p = &g->codes[avail++];
6203 if (avail > 4096) return stbi__errpuc("too many codes", "Corrupt GIF");
6204 p->prefix = (stbi__int16) oldcode;
6205 p->first = g->codes[oldcode].first;
6206 p->suffix = (code == avail) ? p->first : g->codes[code].first;
6207 } else if (code == avail)
6208 return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6209
6210 stbi__out_gif_code(g, (stbi__uint16) code);
6211
6212 if ((avail & codemask) == 0 && avail <= 0x0FFF) {
6213 codesize++;
6214 codemask = (1 << codesize) - 1;
6215 }
6216
6217 oldcode = code;
6218 } else {
6219 return stbi__errpuc("illegal code in raster", "Corrupt GIF");
6220 }
6221 }
6222 }
6223 }
6224
6225 static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1)
6226 {
6227 int x, y;
6228 stbi_uc *c = g->pal[g->bgindex];
6229 for (y = y0; y < y1; y += 4 * g->w) {
6230 for (x = x0; x < x1; x += 4) {
6231 stbi_uc *p = &g->out[y + x];
6232 p[0] = c[2];
6233 p[1] = c[1];
6234 p[2] = c[0];
6235 p[3] = 0;
6236 }
6237 }
6238 }
6239
6240 // this function is designed to support animated gifs, although stb_image doesn't support it
6241 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp)
6242 {
6243 int i;
6244 stbi_uc *prev_out = 0;
6245
6246 if (g->out == 0 && !stbi__gif_header(s, g, comp,0))
6247 return 0; // stbi__g_failure_reason set by stbi__gif_header
6248
6249 if (!stbi__mad3sizes_valid(g->w, g->h, 4, 0))
6250 return stbi__errpuc("too large", "GIF too large");
6251
6252 prev_out = g->out;
6253 g->out = (stbi_uc *) stbi__malloc_mad3(4, g->w, g->h, 0);
6254 if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory");
6255
6256 switch ((g->eflags & 0x1C) >> 2) {
6257 case 0: // unspecified (also always used on 1st frame)
6258 stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h);
6259 break;
6260 case 1: // do not dispose
6261 if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
6262 g->old_out = prev_out;
6263 break;
6264 case 2: // dispose to background
6265 if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h);
6266 stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y);
6267 break;
6268 case 3: // dispose to previous
6269 if (g->old_out) {
6270 for (i = g->start_y; i < g->max_y; i += 4 * g->w)
6271 memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x);
6272 }
6273 break;
6274 }
6275
6276 for (;;) {
6277 switch (stbi__get8(s)) {
6278 case 0x2C: /* Image Descriptor */
6279 {
6280 int prev_trans = -1;
6281 stbi__int32 x, y, w, h;
6282 stbi_uc *o;
6283
6284 x = stbi__get16le(s);
6285 y = stbi__get16le(s);
6286 w = stbi__get16le(s);
6287 h = stbi__get16le(s);
6288 if (((x + w) > (g->w)) || ((y + h) > (g->h)))
6289 return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
6290
6291 g->line_size = g->w * 4;
6292 g->start_x = x * 4;
6293 g->start_y = y * g->line_size;
6294 g->max_x = g->start_x + w * 4;
6295 g->max_y = g->start_y + h * g->line_size;
6296 g->cur_x = g->start_x;
6297 g->cur_y = g->start_y;
6298
6299 g->lflags = stbi__get8(s);
6300
6301 if (g->lflags & 0x40) {
6302 g->step = 8 * g->line_size; // first interlaced spacing
6303 g->parse = 3;
6304 } else {
6305 g->step = g->line_size;
6306 g->parse = 0;
6307 }
6308
6309 if (g->lflags & 0x80) {
6310 stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
6311 g->color_table = (stbi_uc *) g->lpal;
6312 } else if (g->flags & 0x80) {
6313 if (g->transparent >= 0 && (g->eflags & 0x01)) {
6314 prev_trans = g->pal[g->transparent][3];
6315 g->pal[g->transparent][3] = 0;
6316 }
6317 g->color_table = (stbi_uc *) g->pal;
6318 } else
6319 return stbi__errpuc("missing color table", "Corrupt GIF");
6320
6321 o = stbi__process_gif_raster(s, g);
6322 if (o == NULL) return NULL;
6323
6324 if (prev_trans != -1)
6325 g->pal[g->transparent][3] = (stbi_uc) prev_trans;
6326
6327 return o;
6328 }
6329
6330 case 0x21: // Comment Extension.
6331 {
6332 int len;
6333 if (stbi__get8(s) == 0xF9) { // Graphic Control Extension.
6334 len = stbi__get8(s);
6335 if (len == 4) {
6336 g->eflags = stbi__get8(s);
6337 g->delay = stbi__get16le(s);
6338 g->transparent = stbi__get8(s);
6339 } else {
6340 stbi__skip(s, len);
6341 break;
6342 }
6343 }
6344 while ((len = stbi__get8(s)) != 0)
6345 stbi__skip(s, len);
6346 break;
6347 }
6348
6349 case 0x3B: // gif stream termination code
6350 return (stbi_uc *) s; // using '1' causes warning on some compilers
6351
6352 default:
6353 return stbi__errpuc("unknown code", "Corrupt GIF");
6354 }
6355 }
6356
6357 STBI_NOTUSED(req_comp);
6358 }
6359
6360 static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6361 {
6362 stbi_uc *u = 0;
6363 stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
6364 memset(g, 0, sizeof(*g));
6365 STBI_NOTUSED(ri);
6366
6367 u = stbi__gif_load_next(s, g, comp, req_comp);
6368 if (u == (stbi_uc *) s) u = 0; // end of animated gif marker
6369 if (u) {
6370 *x = g->w;
6371 *y = g->h;
6372 if (req_comp && req_comp != 4)
6373 u = stbi__convert_format(u, 4, req_comp, g->w, g->h);
6374 }
6375 else if (g->out)
6376 STBI_FREE(g->out);
6377 STBI_FREE(g);
6378 return u;
6379 }
6380
6381 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
6382 {
6383 return stbi__gif_info_raw(s,x,y,comp);
6384 }
6385 #endif
6386
6387 // *************************************************************************************************
6388 // Radiance RGBE HDR loader
6389 // originally by Nicolas Schulz
6390 #ifndef STBI_NO_HDR
6391 static int stbi__hdr_test_core(stbi__context *s, const char *signature)
6392 {
6393 int i;
6394 for (i=0; signature[i]; ++i)
6395 if (stbi__get8(s) != signature[i])
6396 return 0;
6397 stbi__rewind(s);
6398 return 1;
6399 }
6400
6401 static int stbi__hdr_test(stbi__context* s)
6402 {
6403 int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
6404 stbi__rewind(s);
6405 if(!r) {
6406 r = stbi__hdr_test_core(s, "#?RGBE\n");
6407 stbi__rewind(s);
6408 }
6409 return r;
6410 }
6411
6412 #define STBI__HDR_BUFLEN 1024
6413 static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
6414 {
6415 int len=0;
6416 char c = '\0';
6417
6418 c = (char) stbi__get8(z);
6419
6420 while (!stbi__at_eof(z) && c != '\n') {
6421 buffer[len++] = c;
6422 if (len == STBI__HDR_BUFLEN-1) {
6423 // flush to end of line
6424 while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
6425 ;
6426 break;
6427 }
6428 c = (char) stbi__get8(z);
6429 }
6430
6431 buffer[len] = 0;
6432 return buffer;
6433 }
6434
6435 static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
6436 {
6437 if ( input[3] != 0 ) {
6438 float f1;
6439 // Exponent
6440 f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
6441 if (req_comp <= 2)
6442 output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
6443 else {
6444 output[0] = input[0] * f1;
6445 output[1] = input[1] * f1;
6446 output[2] = input[2] * f1;
6447 }
6448 if (req_comp == 2) output[1] = 1;
6449 if (req_comp == 4) output[3] = 1;
6450 } else {
6451 switch (req_comp) {
6452 case 4: output[3] = 1; /* fallthrough */
6453 case 3: output[0] = output[1] = output[2] = 0;
6454 break;
6455 case 2: output[1] = 1; /* fallthrough */
6456 case 1: output[0] = 0;
6457 break;
6458 }
6459 }
6460 }
6461
6462 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6463 {
6464 char buffer[STBI__HDR_BUFLEN];
6465 char *token;
6466 int valid = 0;
6467 int width, height;
6468 stbi_uc *scanline;
6469 float *hdr_data;
6470 int len;
6471 unsigned char count, value;
6472 int i, j, k, c1,c2, z;
6473 const char *headerToken;
6474 STBI_NOTUSED(ri);
6475
6476 // Check identifier
6477 headerToken = stbi__hdr_gettoken(s,buffer);
6478 if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
6479 return stbi__errpf("not HDR", "Corrupt HDR image");
6480
6481 // Parse header
6482 for(;;) {
6483 token = stbi__hdr_gettoken(s,buffer);
6484 if (token[0] == 0) break;
6485 if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6486 }
6487
6488 if (!valid) return stbi__errpf("unsupported format", "Unsupported HDR format");
6489
6490 // Parse width and height
6491 // can't use sscanf() if we're not using stdio!
6492 token = stbi__hdr_gettoken(s,buffer);
6493 if (strncmp(token, "-Y ", 3)) return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6494 token += 3;
6495 height = (int) strtol(token, &token, 10);
6496 while (*token == ' ') ++token;
6497 if (strncmp(token, "+X ", 3)) return stbi__errpf("unsupported data layout", "Unsupported HDR format");
6498 token += 3;
6499 width = (int) strtol(token, NULL, 10);
6500
6501 *x = width;
6502 *y = height;
6503
6504 if (comp) *comp = 3;
6505 if (req_comp == 0) req_comp = 3;
6506
6507 if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
6508 return stbi__errpf("too large", "HDR image is too large");
6509
6510 // Read data
6511 hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
6512 if (!hdr_data)
6513 return stbi__errpf("outofmem", "Out of memory");
6514
6515 // Load image data
6516 // image data is stored as some number of sca
6517 if ( width < 8 || width >= 32768) {
6518 // Read flat data
6519 for (j=0; j < height; ++j) {
6520 for (i=0; i < width; ++i) {
6521 stbi_uc rgbe[4];
6522 main_decode_loop:
6523 stbi__getn(s, rgbe, 4);
6524 stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
6525 }
6526 }
6527 } else {
6528 // Read RLE-encoded data
6529 scanline = NULL;
6530
6531 for (j = 0; j < height; ++j) {
6532 c1 = stbi__get8(s);
6533 c2 = stbi__get8(s);
6534 len = stbi__get8(s);
6535 if (c1 != 2 || c2 != 2 || (len & 0x80)) {
6536 // not run-length encoded, so we have to actually use THIS data as a decoded
6537 // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
6538 stbi_uc rgbe[4];
6539 rgbe[0] = (stbi_uc) c1;
6540 rgbe[1] = (stbi_uc) c2;
6541 rgbe[2] = (stbi_uc) len;
6542 rgbe[3] = (stbi_uc) stbi__get8(s);
6543 stbi__hdr_convert(hdr_data, rgbe, req_comp);
6544 i = 1;
6545 j = 0;
6546 STBI_FREE(scanline);
6547 goto main_decode_loop; // yes, this makes no sense
6548 }
6549 len <<= 8;
6550 len |= stbi__get8(s);
6551 if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
6552 if (scanline == NULL) {
6553 scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
6554 if (!scanline) {
6555 STBI_FREE(hdr_data);
6556 return stbi__errpf("outofmem", "Out of memory");
6557 }
6558 }
6559
6560 for (k = 0; k < 4; ++k) {
6561 int nleft;
6562 i = 0;
6563 while ((nleft = width - i) > 0) {
6564 count = stbi__get8(s);
6565 if (count > 128) {
6566 // Run
6567 value = stbi__get8(s);
6568 count -= 128;
6569 if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
6570 for (z = 0; z < count; ++z)
6571 scanline[i++ * 4 + k] = value;
6572 } else {
6573 // Dump
6574 if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
6575 for (z = 0; z < count; ++z)
6576 scanline[i++ * 4 + k] = stbi__get8(s);
6577 }
6578 }
6579 }
6580 for (i=0; i < width; ++i)
6581 stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
6582 }
6583 if (scanline)
6584 STBI_FREE(scanline);
6585 }
6586
6587 return hdr_data;
6588 }
6589
6590 static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
6591 {
6592 char buffer[STBI__HDR_BUFLEN];
6593 char *token;
6594 int valid = 0;
6595
6596 if (stbi__hdr_test(s) == 0) {
6597 stbi__rewind( s );
6598 return 0;
6599 }
6600
6601 for(;;) {
6602 token = stbi__hdr_gettoken(s,buffer);
6603 if (token[0] == 0) break;
6604 if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
6605 }
6606
6607 if (!valid) {
6608 stbi__rewind( s );
6609 return 0;
6610 }
6611 token = stbi__hdr_gettoken(s,buffer);
6612 if (strncmp(token, "-Y ", 3)) {
6613 stbi__rewind( s );
6614 return 0;
6615 }
6616 token += 3;
6617 *y = (int) strtol(token, &token, 10);
6618 while (*token == ' ') ++token;
6619 if (strncmp(token, "+X ", 3)) {
6620 stbi__rewind( s );
6621 return 0;
6622 }
6623 token += 3;
6624 *x = (int) strtol(token, NULL, 10);
6625 *comp = 3;
6626 return 1;
6627 }
6628 #endif // STBI_NO_HDR
6629
6630 #ifndef STBI_NO_BMP
6631 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
6632 {
6633 void *p;
6634 stbi__bmp_data info;
6635
6636 info.all_a = 255;
6637 p = stbi__bmp_parse_header(s, &info);
6638 stbi__rewind( s );
6639 if (p == NULL)
6640 return 0;
6641 *x = s->img_x;
6642 *y = s->img_y;
6643 *comp = info.ma ? 4 : 3;
6644 return 1;
6645 }
6646 #endif
6647
6648 #ifndef STBI_NO_PSD
6649 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
6650 {
6651 int channelCount;
6652 if (stbi__get32be(s) != 0x38425053) {
6653 stbi__rewind( s );
6654 return 0;
6655 }
6656 if (stbi__get16be(s) != 1) {
6657 stbi__rewind( s );
6658 return 0;
6659 }
6660 stbi__skip(s, 6);
6661 channelCount = stbi__get16be(s);
6662 if (channelCount < 0 || channelCount > 16) {
6663 stbi__rewind( s );
6664 return 0;
6665 }
6666 *y = stbi__get32be(s);
6667 *x = stbi__get32be(s);
6668 if (stbi__get16be(s) != 8) {
6669 stbi__rewind( s );
6670 return 0;
6671 }
6672 if (stbi__get16be(s) != 3) {
6673 stbi__rewind( s );
6674 return 0;
6675 }
6676 *comp = 4;
6677 return 1;
6678 }
6679 #endif
6680
6681 #ifndef STBI_NO_PIC
6682 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
6683 {
6684 int act_comp=0,num_packets=0,chained;
6685 stbi__pic_packet packets[10];
6686
6687 if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
6688 stbi__rewind(s);
6689 return 0;
6690 }
6691
6692 stbi__skip(s, 88);
6693
6694 *x = stbi__get16be(s);
6695 *y = stbi__get16be(s);
6696 if (stbi__at_eof(s)) {
6697 stbi__rewind( s);
6698 return 0;
6699 }
6700 if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
6701 stbi__rewind( s );
6702 return 0;
6703 }
6704
6705 stbi__skip(s, 8);
6706
6707 do {
6708 stbi__pic_packet *packet;
6709
6710 if (num_packets==sizeof(packets)/sizeof(packets[0]))
6711 return 0;
6712
6713 packet = &packets[num_packets++];
6714 chained = stbi__get8(s);
6715 packet->size = stbi__get8(s);
6716 packet->type = stbi__get8(s);
6717 packet->channel = stbi__get8(s);
6718 act_comp |= packet->channel;
6719
6720 if (stbi__at_eof(s)) {
6721 stbi__rewind( s );
6722 return 0;
6723 }
6724 if (packet->size != 8) {
6725 stbi__rewind( s );
6726 return 0;
6727 }
6728 } while (chained);
6729
6730 *comp = (act_comp & 0x10 ? 4 : 3);
6731
6732 return 1;
6733 }
6734 #endif
6735
6736 // *************************************************************************************************
6737 // Portable Gray Map and Portable Pixel Map loader
6738 // by Ken Miller
6739 //
6740 // PGM: http://netpbm.sourceforge.net/doc/pgm.html
6741 // PPM: http://netpbm.sourceforge.net/doc/ppm.html
6742 //
6743 // Known limitations:
6744 // Does not support comments in the header section
6745 // Does not support ASCII image data (formats P2 and P3)
6746 // Does not support 16-bit-per-channel
6747
6748 #ifndef STBI_NO_PNM
6749
6750 static int stbi__pnm_test(stbi__context *s)
6751 {
6752 char p, t;
6753 p = (char) stbi__get8(s);
6754 t = (char) stbi__get8(s);
6755 if (p != 'P' || (t != '5' && t != '6')) {
6756 stbi__rewind( s );
6757 return 0;
6758 }
6759 return 1;
6760 }
6761
6762 static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
6763 {
6764 stbi_uc *out;
6765 STBI_NOTUSED(ri);
6766
6767 if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n))
6768 return 0;
6769
6770 *x = s->img_x;
6771 *y = s->img_y;
6772 *comp = s->img_n;
6773
6774 if (!stbi__mad3sizes_valid(s->img_n, s->img_x, s->img_y, 0))
6775 return stbi__errpuc("too large", "PNM too large");
6776
6777 out = (stbi_uc *) stbi__malloc_mad3(s->img_n, s->img_x, s->img_y, 0);
6778 if (!out) return stbi__errpuc("outofmem", "Out of memory");
6779 stbi__getn(s, out, s->img_n * s->img_x * s->img_y);
6780
6781 if (req_comp && req_comp != s->img_n) {
6782 out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
6783 if (out == NULL) return out; // stbi__convert_format frees input on failure
6784 }
6785 return out;
6786 }
6787
6788 static int stbi__pnm_isspace(char c)
6789 {
6790 return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
6791 }
6792
6793 static void stbi__pnm_skip_whitespace(stbi__context *s, char *c)
6794 {
6795 for (;;) {
6796 while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
6797 *c = (char) stbi__get8(s);
6798
6799 if (stbi__at_eof(s) || *c != '#')
6800 break;
6801
6802 while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
6803 *c = (char) stbi__get8(s);
6804 }
6805 }
6806
6807 static int stbi__pnm_isdigit(char c)
6808 {
6809 return c >= '0' && c <= '9';
6810 }
6811
6812 static int stbi__pnm_getinteger(stbi__context *s, char *c)
6813 {
6814 int value = 0;
6815
6816 while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
6817 value = value*10 + (*c - '0');
6818 *c = (char) stbi__get8(s);
6819 }
6820
6821 return value;
6822 }
6823
6824 static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
6825 {
6826 int maxv;
6827 char c, p, t;
6828
6829 stbi__rewind( s );
6830
6831 // Get identifier
6832 p = (char) stbi__get8(s);
6833 t = (char) stbi__get8(s);
6834 if (p != 'P' || (t != '5' && t != '6')) {
6835 stbi__rewind( s );
6836 return 0;
6837 }
6838
6839 *comp = (t == '6') ? 3 : 1; // '5' is 1-component .pgm; '6' is 3-component .ppm
6840
6841 c = (char) stbi__get8(s);
6842 stbi__pnm_skip_whitespace(s, &c);
6843
6844 *x = stbi__pnm_getinteger(s, &c); // read width
6845 stbi__pnm_skip_whitespace(s, &c);
6846
6847 *y = stbi__pnm_getinteger(s, &c); // read height
6848 stbi__pnm_skip_whitespace(s, &c);
6849
6850 maxv = stbi__pnm_getinteger(s, &c); // read max value
6851
6852 if (maxv > 255)
6853 return stbi__err("max value > 255", "PPM image not 8-bit");
6854 else
6855 return 1;
6856 }
6857 #endif
6858
6859 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
6860 {
6861 #ifndef STBI_NO_JPEG
6862 if (stbi__jpeg_info(s, x, y, comp)) return 1;
6863 #endif
6864
6865 #ifndef STBI_NO_PNG
6866 if (stbi__png_info(s, x, y, comp)) return 1;
6867 #endif
6868
6869 #ifndef STBI_NO_GIF
6870 if (stbi__gif_info(s, x, y, comp)) return 1;
6871 #endif
6872
6873 #ifndef STBI_NO_BMP
6874 if (stbi__bmp_info(s, x, y, comp)) return 1;
6875 #endif
6876
6877 #ifndef STBI_NO_PSD
6878 if (stbi__psd_info(s, x, y, comp)) return 1;
6879 #endif
6880
6881 #ifndef STBI_NO_PIC
6882 if (stbi__pic_info(s, x, y, comp)) return 1;
6883 #endif
6884
6885 #ifndef STBI_NO_PNM
6886 if (stbi__pnm_info(s, x, y, comp)) return 1;
6887 #endif
6888
6889 #ifndef STBI_NO_HDR
6890 if (stbi__hdr_info(s, x, y, comp)) return 1;
6891 #endif
6892
6893 // test tga last because it's a crappy test!
6894 #ifndef STBI_NO_TGA
6895 if (stbi__tga_info(s, x, y, comp))
6896 return 1;
6897 #endif
6898 return stbi__err("unknown image type", "Image not of any known type, or corrupt");
6899 }
6900
6901 #ifndef STBI_NO_STDIO
6902 STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
6903 {
6904 FILE *f = stbi__fopen(filename, "rb");
6905 int result;
6906 if (!f) return stbi__err("can't fopen", "Unable to open file");
6907 result = stbi_info_from_file(f, x, y, comp);
6908 fclose(f);
6909 return result;
6910 }
6911
6912 STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
6913 {
6914 int r;
6915 stbi__context s;
6916 long pos = ftell(f);
6917 stbi__start_file(&s, f);
6918 r = stbi__info_main(&s,x,y,comp);
6919 fseek(f,pos,SEEK_SET);
6920 return r;
6921 }
6922 #endif // !STBI_NO_STDIO
6923
6924 STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
6925 {
6926 stbi__context s;
6927 stbi__start_mem(&s,buffer,len);
6928 return stbi__info_main(&s,x,y,comp);
6929 }
6930
6931 STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
6932 {
6933 stbi__context s;
6934 stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
6935 return stbi__info_main(&s,x,y,comp);
6936 }
6937
6938 #endif // STB_IMAGE_IMPLEMENTATION
6939
6940 /*
6941 revision history:
6942 2.13 (2016-11-29) add 16-bit API, only supported for PNG right now
6943 2.12 (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
6944 2.11 (2016-04-02) allocate large structures on the stack
6945 remove white matting for transparent PSD
6946 fix reported channel count for PNG & BMP
6947 re-enable SSE2 in non-gcc 64-bit
6948 support RGB-formatted JPEG
6949 read 16-bit PNGs (only as 8-bit)
6950 2.10 (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
6951 2.09 (2016-01-16) allow comments in PNM files
6952 16-bit-per-pixel TGA (not bit-per-component)
6953 info() for TGA could break due to .hdr handling
6954 info() for BMP to shares code instead of sloppy parse
6955 can use STBI_REALLOC_SIZED if allocator doesn't support realloc
6956 code cleanup
6957 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
6958 2.07 (2015-09-13) fix compiler warnings
6959 partial animated GIF support
6960 limited 16-bpc PSD support
6961 #ifdef unused functions
6962 bug with < 92 byte PIC,PNM,HDR,TGA
6963 2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value
6964 2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning
6965 2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit
6966 2.03 (2015-04-12) extra corruption checking (mmozeiko)
6967 stbi_set_flip_vertically_on_load (nguillemot)
6968 fix NEON support; fix mingw support
6969 2.02 (2015-01-19) fix incorrect assert, fix warning
6970 2.01 (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
6971 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
6972 2.00 (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
6973 progressive JPEG (stb)
6974 PGM/PPM support (Ken Miller)
6975 STBI_MALLOC,STBI_REALLOC,STBI_FREE
6976 GIF bugfix -- seemingly never worked
6977 STBI_NO_*, STBI_ONLY_*
6978 1.48 (2014-12-14) fix incorrectly-named assert()
6979 1.47 (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
6980 optimize PNG (ryg)
6981 fix bug in interlaced PNG with user-specified channel count (stb)
6982 1.46 (2014-08-26)
6983 fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
6984 1.45 (2014-08-16)
6985 fix MSVC-ARM internal compiler error by wrapping malloc
6986 1.44 (2014-08-07)
6987 various warning fixes from Ronny Chevalier
6988 1.43 (2014-07-15)
6989 fix MSVC-only compiler problem in code changed in 1.42
6990 1.42 (2014-07-09)
6991 don't define _CRT_SECURE_NO_WARNINGS (affects user code)
6992 fixes to stbi__cleanup_jpeg path
6993 added STBI_ASSERT to avoid requiring assert.h
6994 1.41 (2014-06-25)
6995 fix search&replace from 1.36 that messed up comments/error messages
6996 1.40 (2014-06-22)
6997 fix gcc struct-initialization warning
6998 1.39 (2014-06-15)
6999 fix to TGA optimization when req_comp != number of components in TGA;
7000 fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
7001 add support for BMP version 5 (more ignored fields)
7002 1.38 (2014-06-06)
7003 suppress MSVC warnings on integer casts truncating values
7004 fix accidental rename of 'skip' field of I/O
7005 1.37 (2014-06-04)
7006 remove duplicate typedef
7007 1.36 (2014-06-03)
7008 convert to header file single-file library
7009 if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
7010 1.35 (2014-05-27)
7011 various warnings
7012 fix broken STBI_SIMD path
7013 fix bug where stbi_load_from_file no longer left file pointer in correct place
7014 fix broken non-easy path for 32-bit BMP (possibly never used)
7015 TGA optimization by Arseny Kapoulkine
7016 1.34 (unknown)
7017 use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
7018 1.33 (2011-07-14)
7019 make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
7020 1.32 (2011-07-13)
7021 support for "info" function for all supported filetypes (SpartanJ)
7022 1.31 (2011-06-20)
7023 a few more leak fixes, bug in PNG handling (SpartanJ)
7024 1.30 (2011-06-11)
7025 added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
7026 removed deprecated format-specific test/load functions
7027 removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
7028 error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
7029 fix inefficiency in decoding 32-bit BMP (David Woo)
7030 1.29 (2010-08-16)
7031 various warning fixes from Aurelien Pocheville
7032 1.28 (2010-08-01)
7033 fix bug in GIF palette transparency (SpartanJ)
7034 1.27 (2010-08-01)
7035 cast-to-stbi_uc to fix warnings
7036 1.26 (2010-07-24)
7037 fix bug in file buffering for PNG reported by SpartanJ
7038 1.25 (2010-07-17)
7039 refix trans_data warning (Won Chun)
7040 1.24 (2010-07-12)
7041 perf improvements reading from files on platforms with lock-heavy fgetc()
7042 minor perf improvements for jpeg
7043 deprecated type-specific functions so we'll get feedback if they're needed
7044 attempt to fix trans_data warning (Won Chun)
7045 1.23 fixed bug in iPhone support
7046 1.22 (2010-07-10)
7047 removed image *writing* support
7048 stbi_info support from Jetro Lauha
7049 GIF support from Jean-Marc Lienher
7050 iPhone PNG-extensions from James Brown
7051 warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
7052 1.21 fix use of 'stbi_uc' in header (reported by jon blow)
7053 1.20 added support for Softimage PIC, by Tom Seddon
7054 1.19 bug in interlaced PNG corruption check (found by ryg)
7055 1.18 (2008-08-02)
7056 fix a threading bug (local mutable static)
7057 1.17 support interlaced PNG
7058 1.16 major bugfix - stbi__convert_format converted one too many pixels
7059 1.15 initialize some fields for thread safety
7060 1.14 fix threadsafe conversion bug
7061 header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
7062 1.13 threadsafe
7063 1.12 const qualifiers in the API
7064 1.11 Support installable IDCT, colorspace conversion routines
7065 1.10 Fixes for 64-bit (don't use "unsigned long")
7066 optimized upsampling by Fabian "ryg" Giesen
7067 1.09 Fix format-conversion for PSD code (bad global variables!)
7068 1.08 Thatcher Ulrich's PSD code integrated by Nicolas Schulz
7069 1.07 attempt to fix C++ warning/errors again
7070 1.06 attempt to fix C++ warning/errors again
7071 1.05 fix TGA loading to return correct *comp and use good luminance calc
7072 1.04 default float alpha is 1, not 255; use 'void *' for stbi_image_free
7073 1.03 bugfixes to STBI_NO_STDIO, STBI_NO_HDR
7074 1.02 support for (subset of) HDR files, float interface for preferred access to them
7075 1.01 fix bug: possible bug in handling right-side up bmps... not sure
7076 fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
7077 1.00 interface to zlib that skips zlib header
7078 0.99 correct handling of alpha in palette
7079 0.98 TGA loader by lonesock; dynamically add loaders (untested)
7080 0.97 jpeg errors on too large a file; also catch another malloc failure
7081 0.96 fix detection of invalid v value - particleman@mollyrocket forum
7082 0.95 during header scan, seek to markers in case of padding
7083 0.94 STBI_NO_STDIO to disable stdio usage; rename all #defines the same
7084 0.93 handle jpegtran output; verbose errors
7085 0.92 read 4,8,16,24,32-bit BMP files of several formats
7086 0.91 output 24-bit Windows 3.0 BMP files
7087 0.90 fix a few more warnings; bump version number to approach 1.0
7088 0.61 bugfixes due to Marc LeBlanc, Christopher Lloyd
7089 0.60 fix compiling as c++
7090 0.59 fix warnings: merge Dave Moore's -Wall fixes
7091 0.58 fix bug: zlib uncompressed mode len/nlen was wrong endian
7092 0.57 fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
7093 0.56 fix bug: zlib uncompressed mode len vs. nlen
7094 0.55 fix bug: restart_interval not initialized to 0
7095 0.54 allow NULL for 'int *comp'
7096 0.53 fix bug in png 3->4; speedup png decoding
7097 0.52 png handles req_comp=3,4 directly; minor cleanup; jpeg comments
7098 0.51 obey req_comp requests, 1-component jpegs return as 1-component,
7099 on 'test' only check type, not whether we support this variant
7100 0.50 (2006-11-19)
7101 first released version
7102 */