Images (with Umlaute in names) not shown after uploading to staging

Hi!
I’ve manually imported an (very) old site (orig. WP4…9/php 5.6=>then 5.2, php7.2), using a backup zip-file (manually created from the original hoster containing a sql file and the public directory), into DevKinsta, done WP/PHP-upgrades (including theme & plugins maintenance). Everything worked out fine locally within DevKinsta.
After creating a new Kinsta Wordpress site using an empty live and staging env (no WP installed) I uploaded the local DevKinsta content and db to staging (using Sync/Upload-command from DevKinsta).

It mostly worked, but every image which has german Umlaute (e.g. ‘ö’,‘Ö’,‘ä’,‘Ä’) or other special chars in there filenames where not shown.

In the webbrowser console I got 404 errors. There seem to be a mismatch between the rendered utf8 html img src-names and the actual filenames in uploads directory.

BTW: The database (and wp-config.php) uses utf8 and utf8mb4_unicode_520_ci/utf8mb4_general_ci for tables.
After further investigation i found that the filenames use utf8, but where not normalized utf-8 (also on DevKinsta and the original server, where it is not a problem at all). To check if there was a rsync problem during upload I also uploaded a zip-file with my upolads-content and unzipped it and also importing a new db-backup using DevKinsta Adminer. But that didn’t change anyting.

Regarding normalization I found, as an example showing the char ‘ö’ (aka oe):
In html as src of imag-tag (hex): C3B6 … normalized utf8
In the filesystem (uploads dir)- in the filename (hex): 6FCC88 … not normalized
Both cases show the char ‘ü’.

This does not make a problem either in the local DevKinsta site or the original site, but only within the Kinsta staging env?

Any ideas/support will be highly appreciated!

Hi @Vitus :wave:

Welcome to the Kinsta Community!

Could you send me the MyKinsta URL link of your staging via private message? We would like to take a closer look at why those images with special characters show a 404.

@Vitus

You may also try to decompress the zip file manually using the -a parameter, for example, unzip -a wpfileshere.zip as one of the suggestions on this external forum character encoding - How can I correctly decompress a ZIP archive of files with Hebrew names? - Unix & Linux Stack Exchange

I hope it helps!

unzip -a is meant to convert the content of text files. Which is not my problem. The problem are the filenames quirks and/or the Kinsta runtime environment (as the same filenames work locally).
I’ll try using the plugin Migrate Guru (or Duplicator) to transfer my local content again (and keep you updated).

I’ve written a small program do rename/normalize utf8 filenames.
This seems to fix my problems, but I’m unhappy, because not shure, why it happens in the first place. Also not tested yet, if uploading new client files (from a Windows PC) will have the same problems.

My fixnames.php program:

#!/usr/bin/php

<?php # s.a.: https://github.com/neitanod/forceutf8 # s.a.: https://stackoverflow.com/questions/67532384/encoding-problem-char-looks-right-but-is-not # s.a.: https://www.php.net/manual/de/class.normalizer.php, https://en.wikipedia.org/wiki/Unicode_equivalence #error_reporting(E_ALL & ~E_DEPRECATED); # utf8_decode, ... function str_to_hex($string) { $hexstr = unpack('H*', $string); return array_shift($hexstr); } function fixnames($path,$dofix = false) { $d = opendir($path); while ($f = readdir($d)) { if (($f !='.') && ($f !='..')) { $enc = mb_detect_encoding($f); # find ordinary filename (ASCII; no Umlaute/Sonderzeichen) if ($enc) { if ($enc != 'UTF-8' && $enc != 'ASCII') { /* convert filename using convmv or 'find -mindepth 1 -exec sh -c 'mv "$1" "$(echo "$1" | iconv -f cp1252 -t utf8)"' sh {} \;' using correct original charset */ print( "CONV: ". $enc .": ". $path.'/'.$f ."\n"); } else { #$form = Normalizer::FORM_C; $form = Normalizer::FORM_KC; print("HEX: ". str_to_hex($f) ." = ". $f ."\n"); # orig fname as hex if ( Normalizer::isNormalized( $f, $form )) { print( $enc .": ". $path.'/'.$f ."\n"); } else { # normalize filename /* s.a.: https://unix.stackexchange.com/questions/251969/how-can-i-correctly-decompress-a-zip-archive-of-files-with-hebrew-names BTW: convmv didn't work either, because it insisted that it is already utf8 */ $n = Normalizer::normalize( $f, $form ); print( $enc .": ". $path.'/'.$f ." => ". $n ."\n"); print("HEXN: ". str_to_hex($n) ." <= ". $f ."\n"); # normalized fname as hex if (is_file($path.'/'.$f) && $dofix) { if (rename($path.'/'.$f,$path.'/'.$n)) { print("RENAMED: ". $path.'/'.$f ."=>". $path.'/'.$n); } else { print("RENAME FAILED: ". $path.'/'.$f); } } } } } else { print("UNKNOWN: ". $path.'/'.$f ."\n"); } if (is_dir($path.'/'.$f)) { fixnames($path.'/'.$f,$dofix); } } } closedir($d); } fixnames('.',false); ?>

FYI: Migration Guro doesn’t work with local (DevKinst) Site.

Hi Adrian,

Just to keep you updated: Duplicator plugin didn’t work either (even after setting max execution time to 0 and increased max-mem-limit to 900M). Always terminated the creation process with unspecific error message. Duplicator Pro is no option, because Kinsta banned it.

The best info I could find so far is here: tar and utf-8 | The FreeBSD Forums
This seems to be my orginal problem, because I’ve started using a tar ball from current provider. I’ll try to recreate my uploads content from the tar file and using different LOCALE/LANG settings.

Unresolved/unclear is: Why is it working within DevKinsta, but not on the Kinsta staging env? Maybe some differences in Nginx/LANG settings?

Set env LANG=C tar -xf wp-backup.tar.gz didn’t solve my problem. Same content was restored as without LANG setting (default locale is LANG=“de_AT.UTF-8”).
This explicit env-setting hack may function during creation of the tar ball, but couldn’t check, because I have no access to the original server.

I proceeded with my own php program to fix all non normalized utf8 filenames, which used combining character after the basic char for (german) umlaute (e.g. the german umlaut char ‘ü’ should be represented as C3 BC (hex), when normalized, but is coded as 75 CC 88 in non-normalized version, as extracted from tar file). WP database has stored the normalized filenames, so accessing the filename gives an 404 error from Nginx.

Still baffled why it did work in DevKinsta?

Hi @Vitus ,

Thank you for sharing the steps you took to resolve the issue. Your contribution is highly appreciated and valuable to this community as a reference. However, I am not sure if you run your fix to entire images having issues, as it seems that images containing the unique character “ü” are still giving a 404 error on your staging homepage at least. Despite this issue, we are grateful for your workaround.

I have compared the Nginx configuration between DevKinsta and our live environment, and both use the same encoding, which is “charset UTF-8;”. So it is still unsure why it worked on your Devkinsta while giving 404 in staging.

Furthermore, I attempted to download an image that was causing a 404 error on your staging due to the U-Umlaut character—“ü” in my DevKinsta (Hyper-V) and via sFTP in my live staging—but both were resolving using the normalized encoding “C3%BC.” By the way, I am using a Windows 10 PC.

Hi Adrian,

The problem was not solved correctly using filename-normalization techniques. I’m still not shure why images where shown on my Safari browser (maybe from cache, although I always did force-reloads, but unfortunately didn’t at that point test using different browsers).

Finally with help from Kinsta support I took following actions, which now have resolved the problems also on the staging env.

a) Made an “mysqldump --default-character-set=utf8mb4” direct in the Docker terminal, then uploaded that sql file manually to Kinsta. On the problematic staging env Adminer did show collation is latin-1-swedisch for the database (default setting), not utf8mb4_unicode_ci as now.
b) Also uploaded the original tar.xz file from production (not my DevKinsta files as in previous attempts) into staging env and extracted only uploads-content directly into the respective staging uploads directory.

My DevKinsta runs on an MacBook Pro OSX system.

BTW: The original images where resized, web-optimzed and uploaded into the production system from a OSX system (after downloading them from an NextCloud server). Then some of the problematic image files (mostly having “_schnitt” or “-1”-name parts) where either downloaded again or original files from NextCloud used, edited (usually Photoshop for cropping), then uploaded again, always on Windows 10 PCs.

For the records: I’ve still no clues why DevKinsta cloning/synching didn’t work as expected.

Maybe DevKinsta is using a mysqldump/mariadb-dump without explicitly stating default-character-set/collation, which is eventually needed for successful recreation during importing?

Also obviously manually doing an sql export using Adminer (within DevKinsta) didn’t worked either. Or my problem was caused from extracting the tar.xz on MacOSX and then uploading the MacOSX compressed files, then unzip it on the staging env.

Duplicator plugin also was a huge disappointment (no useful error information or log-infos after unsuccessful long runs). Migration Guru is not usable from a local DevKinsta environment.

Hi @Vitus

I am happy to hear that our support team was able to assist you in resolving the issue you were facing. I have also checked your staging and can confirm that the images with special characters (German) are now functioning as expected.

Regarding the mysqldump script of Devkinsta, I will forward your feedback to our Devkinsta developers for further review and action. Going forward, I hope that your experience with our local deployment tool, Devkinsta, will be smooth and hassle-free. :slight_smile: