Helge Oldach
2021-08-20 09:03:26 UTC
Hi all,
I'm confused about the FreeBSD behaviour with respect to locale's
and grep - specifically, it seems case sensitivity is not handled
consistently when grepping character ranges. It looks to me like 11 and
13 are not behaving consistently however I'm unclear why.
# uname -a
FreeBSD 11STABLE 11.4-STABLE FreeBSD 11.4-STABLE #1059 r368289M: Thu Dec 3 01:48:30 UTC 2020 ***@XXX amd64
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
bla
Bla
#
# uname -a
FreeBSD 13STABLE 13.0-STABLE FreeBSD 13.0-STABLE #49 stable/13-n246779-64085efb677-dirty: Mon Aug 16 08:42:53 CEST 2021 ***@XXX amd64
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
bla
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
#
For comparison, a Linux RHEL box delivers the expected results:
# uname -a
Linux rhel.local 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
#
There is nothing special in the environment, specifically no LC_xxx nor
MM_CHARSET in either case.
Any guidance is appreciated... Thanks!
Kind regards
Helge
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
I'm confused about the FreeBSD behaviour with respect to locale's
and grep - specifically, it seems case sensitivity is not handled
consistently when grepping character ranges. It looks to me like 11 and
13 are not behaving consistently however I'm unclear why.
# uname -a
FreeBSD 11STABLE 11.4-STABLE FreeBSD 11.4-STABLE #1059 r368289M: Thu Dec 3 01:48:30 UTC 2020 ***@XXX amd64
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
bla
Bla
#
# uname -a
FreeBSD 13STABLE 13.0-STABLE FreeBSD 13.0-STABLE #49 stable/13-n246779-64085efb677-dirty: Mon Aug 16 08:42:53 CEST 2021 ***@XXX amd64
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
bla
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
#
For comparison, a Linux RHEL box delivers the expected results:
# uname -a
Linux rhel.local 3.10.0-1062.9.1.el7.x86_64 #1 SMP Mon Dec 2 08:31:54 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
# export LANG=en_US.ISO8859-1
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=C
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
# export LANG=en_US.UTF-8
# (echo bla; echo Bla) | grep '[A-Z]'
Bla
#
There is nothing special in the environment, specifically no LC_xxx nor
MM_CHARSET in either case.
Any guidance is appreciated... Thanks!
Kind regards
Helge
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de