Resolved Problem with compressed XML sitemap

Status
Not open for further replies.

costas1

Member
Messages
134
Reaction score
2
Points
18
I figured out what I did wrong yesterday. I thought I disabled .gz file filtering, but I used the command wrong. After an hour, I was pretty fed up with troubleshooting this :banghead:

Put this in your .htaccess file in the folder where you have your sitemaps

Code:
<FilesMatch "[.]xml[.]gz$">
    Header set Content-Type "application/x-gzip"
    Header unset Content-Encoding
    SetEnv no-gzip 1
    RemoveOutputFilter gz
</FilesMatch>

Some of these may not be required, but this is what got it working for me, so you can play with the directives if you think you need to. The important one appears to be RemoveOutputFilter.

So, the way this works is:
If a file matches the format "[.]xml[.]gz$", then apply the following rules. Note that in Regular Expressions, the "$" has a special meaning and it means the file name has to end with .xml.gz. Something like .xml.gz.exe will not work, because it doesn't end with .xml.gz exactly. The periods are inside of square brackets because periods also have a special meaning. When anything is in a square bracket, it will not have a special meaning. I don't know if that's strictly necessary for .htaccess specifically, but it works and the square brackets don't hurt anything.


I use the "\" in order to escape the ".": https://httpd.apache.org/docs/2.4/rewrite/intro.html#regex, otherwise the "." will just match any single character. The brackets are used for matching one character of a group of characters, so I guess "." and "[.]" are essentially identical.

That's the version I used:

Code:
<FilesMatch "\.xml\.gz$">
    Header unset Content-Encoding
</FilesMatch>




That code appears to work with all the browsers I tried: Chrome, Edge, Firefox and Opera.

The "RemoveOutputFilter" didn't seem to work for me. Isn't that way for you too? What happens if you try my version of code?
 
Last edited:

garrettroyce

Community Support
Community Support
Messages
5,603
Reaction score
246
Points
63
[] means any character in the brackets and since . is the only character, only it can be matched. \. means only literal . can be matched. It's a preferential thing. I try to avoid escape character hell with things like /\/\ and "/"" because they're hard to read, but it's just my style. When you work with code where the escapes have to be escaped, it can be really ugly and you get things like quadruple backslashes. Brackets don't normally need to be escaped, so you avoid double escapes. Ironically, you could actually leave it just ".", but in the rare case you got something like "filexxmlxgz" it would actually match. I had it that way initially and then read that it uses regex and not glob, but it was technically working.

Anyway, I tried so many things, I can't remember which did which. I also used a combination of curl and Vivaldi, so some things when they didn't work in one, I didn't try in the other, so possibly I missed some combination.

The strange thing is curl works no matter what settings I had, and Vivaldi (basically open source Chromium) didn't work unless I did what I posted (apparently your shortened version as well). So what's the difference? Clearly it's not the server, because it sends the same headers and file data, so the browser is doing something weird. Setting the headers forces the browser to do things not weird, so I stand by my initial statement that it's a browser thing, and the fact that there's no setting to change this behavior makes it really annoying. It doesn't happen with any other file type, either, so it's definitely not consistent. But, every browser does it, so it's a standardized bug, except for curl? /rant

The important thing is that it works. If you want to post here, or PM me the new link, I'll try it in Vivaldi and curl, but since curl always worked and Vivaldi is basically Chromium, I don't think I have anything new to contribute.
 

costas1

Member
Messages
134
Reaction score
2
Points
18
[] means any character in the brackets and since . is the only character, only it can be matched. \. means only literal . can be matched. It's a preferential thing. I try to avoid escape character hell with things like /\/\ and "/"" because they're hard to read, but it's just my style. When you work with code where the escapes have to be escaped, it can be really ugly and you get things like quadruple backslashes. Brackets don't normally need to be escaped, so you avoid double escapes. Ironically, you could actually leave it just ".", but in the rare case you got something like "filexxmlxgz" it would actually match. I had it that way initially and then read that it uses regex and not glob, but it was technically working.

Anyway, I tried so many things, I can't remember which did which. I also used a combination of curl and Vivaldi, so some things when they didn't work in one, I didn't try in the other, so possibly I missed some combination.

The strange thing is curl works no matter what settings I had, and Vivaldi (basically open source Chromium) didn't work unless I did what I posted (apparently your shortened version as well). So what's the difference? Clearly it's not the server, because it sends the same headers and file data, so the browser is doing something weird. Setting the headers forces the browser to do things not weird, so I stand by my initial statement that it's a browser thing, and the fact that there's no setting to change this behavior makes it really annoying. It doesn't happen with any other file type, either, so it's definitely not consistent. But, every browser does it, so it's a standardized bug, except for curl? /rant

The important thing is that it works. If you want to post here, or PM me the new link, I'll try it in Vivaldi and curl, but since curl always worked and Vivaldi is basically Chromium, I don't think I have anything new to contribute.

I also consider the issue resolved. There is no new link. I'm doing my testing with the same link I have sent you with a PM.
 

garrettroyce

Community Support
Community Support
Messages
5,603
Reaction score
246
Points
63
I have no PM from you, but as long as you're happy, then everything is good.
 

garrettroyce

Community Support
Community Support
Messages
5,603
Reaction score
246
Points
63

costas1

Member
Messages
134
Reaction score
2
Points
18
For me the backslash is more of a habit from other programming languages.
 
Status
Not open for further replies.
Top