Search Haskell Channel Logs

Monday, February 6, 2017

#haskell channel featuring lambdabot, magthe, ongy, quchen, arne_, halogenandtoast, and 5 others.

halogenandtoast 2017-02-05 21:46:26
Does anyone have experience (or suggestions) for using regex-compat with japanese text?
halogenandtoast 2017-02-05 21:46:46
`*** Exception: user error (Text.Regex.Posix.String died: (ReturnCode 17,"illegal byte sequence"))` is what I'm currently dealing with.
halogenandtoast 2017-02-05 21:47:32
the call being `matchRegexAll (mkRegex "doesn't matter") "宮城県 津波で壊れた小学校の新しい建物ができる"`
Axman6 2017-02-05 21:50:47
are you ure things are in the right encodings?
Xnuk 2017-02-05 21:57:10
halogenandtoast: I cannot reproduce the error; it just returns `Nothing` at regex-compat-0.95.1 & regex-posix-0.95.2
halogenandtoast 2017-02-05 21:59:43
Xnuk: hmm not sure how to check my versions one sec.
halogenandtoast 2017-02-05 22:00:43
regex-compat 0.95.1 and regex-posix 0.95.2 is the same that I have
halogenandtoast 2017-02-05 22:00:53
Axman6: I'm never quite sure
halogenandtoast 2017-02-05 22:02:25
Is the encoding automatically applied / chosen for me when I create a string
halogenandtoast 2017-02-05 22:02:31
in my repl I'm just doing `let x = matchRegexAll (mkRegex "doesn't matter") "宮"`
halogenandtoast 2017-02-05 22:02:52
and when I try to check the value of x it fails with illegal byte sequence
Xnuk 2017-02-05 22:03:20
`String` is just list of unicode chars
halogenandtoast 2017-02-05 22:03:41
Sure and unicode should cover the characters I need (I think)
halogenandtoast 2017-02-05 22:04:35
Hmm when I type alternate Kanji, it works fine
halogenandtoast 2017-02-05 22:04:47
there might be an unprintable character I've copied and pasted
Xnuk 2017-02-05 22:06:03
Is your terminal's encoding set properly?
halogenandtoast 2017-02-05 22:06:56
not sure, a kanji I'm having trouble with seems to have the encoding "\23470"
halogenandtoast 2017-02-05 22:07:11
so `let x = matchRegexAll (mkRegex "doesn't matter") "\23470"`
halogenandtoast 2017-02-05 22:07:13
fails for me
ongy 2017-02-05 22:07:23
> text "\23470"
lambdabot 2017-02-05 22:07:26
Xnuk 2017-02-05 22:08:01
hm that's awkward
halogenandtoast 2017-02-05 22:08:10
What is?
Xnuk 2017-02-05 22:09:29
I still get no errors
halogenandtoast 2017-02-05 22:09:57
*sigh*
geekosaur 2017-02-05 22:10:09
what's your current locale?
Xnuk 2017-02-05 22:10:44
en_US.UTF-8
halogenandtoast 2017-02-05 22:10:52
geekosaur: is there an easy way to check? LC_ALL is currently empty
geekosaur 2017-02-05 22:11:00
(also you may have to load full locale information instead of restricted ones, fi you;re using e.g. docket)
Xnuk 2017-02-05 22:11:06
$ locale
geekosaur 2017-02-05 22:11:08
halogenandtoast, the locale command
geekosaur 2017-02-05 22:11:26
but, $LC_ALL is dominated by $LANG
halogenandtoast 2017-02-05 22:11:35
LANG and LC_ALL are empty
geekosaur 2017-02-05 22:11:40
$LANG, $LC_ALL, $LC_category
geekosaur 2017-02-05 22:11:54
then your locale is probably C and those won't work
halogenandtoast 2017-02-05 22:12:04
LC_CTYPE="UTF-8" and the rest are "C"
geekosaur 2017-02-05 22:12:34
hm, typoed earlier. "docker"
halogenandtoast 2017-02-05 22:13:09
This is just on my locale machine currently, setting my locale fix this?
geekosaur 2017-02-05 22:13:11
so, exit ghci and rerun it as: LANG=en_US.UTF-8 ghci
geekosaur 2017-02-05 22:13:24
(or some other UTF8 locale if appropriate)
geekosaur 2017-02-05 22:15:17
hm, I don't think I have ja_JP locale foo installed on here :/
halogenandtoast 2017-02-05 22:15:30
I still get the same error.
halogenandtoast 2017-02-05 22:16:13
Is there a way to see which locale haskell thinks is the current one?
cocreature 2017-02-05 22:16:16
maybe you need to generate the locale for this to work?
cocreature 2017-02-05 22:16:23
try "locale -a" to see the ones you have installed
geekosaur 2017-02-05 22:16:24
yeh, might need to install locale files
geekosaur 2017-02-05 22:16:59
as I just noted I don;t seem to have any Japanese locales on here (was hoping to verify what locale would be suitable for a .jp address :)
geekosaur 2017-02-05 22:17:18
I should probably correct that
halogenandtoast 2017-02-05 22:17:30
geekosaur: what like `ja_JP.UTF-8`
geekosaur 2017-02-05 22:17:51
if that exists, yes
halogenandtoast 2017-02-05 22:18:10
yeah that doesn't seem to work, I seem to have a lot of locales available
halogenandtoast 2017-02-05 22:18:15
via `locale -a`
Xnuk 2017-02-05 22:18:32
https://wiki.archlinuxjp.org/index.php/%E3%83%AD%E3%82%B1%E3%83%BC%E3%83%AB
halogenandtoast 2017-02-05 22:18:35
including the `en_US.UTF-8` I'm currently trying
magthe 2017-02-05 22:18:46
Can I use PatternGuards in a case statement, or is it only for function defs?
geekosaur 2017-02-05 22:18:48
you need to user a utf8 locale from that list
geekosaur 2017-02-05 22:19:26
magthe, function defs turn into case statements, I would expect it to work
halogenandtoast 2017-02-05 22:19:49
my list shows en_US.UTF-8 I've set LC_ALL to that (and trying setting it when running ghc)
geekosaur 2017-02-05 22:20:17
and the manual just says "guards" which implies both case and function definitions
quchen 2017-02-05 22:20:56
magthe: Function definitions desugar to case blocks
arne_ 2017-02-05 22:21:28
hi it's me, asking the obvious troll questions: why isn't it better to use a non-functional language language, and try to use it's lambda stuff as much as we can?
quchen 2017-02-05 22:22:12
»has map and filter« does not mean »supports functional style«, it means »hey, me too«.
magthe 2017-02-05 22:22:21
then I'm missing something in syntax since I get parse errors
halogenandtoast 2017-02-05 22:22:41
arne_: troll answer: better is subjective.
arne_ 2017-02-05 22:22:48
halogenandtoast: eh, eh
geekosaur 2017-02-05 22:23:18
magthe, perhaps lpaste your code
arne_ 2017-02-05 22:23:18
no i mean seriously, if you use lambda expressions as they're supposed to, you're not losing anything right? except type-safety, and that maybe can be patched in in some most languages?
geekosaur 2017-02-05 22:23:20
@paste
lambdabot 2017-02-05 22:23:20
Haskell pastebin: http://lpaste.net/
magthe 2017-02-05 22:23:26
geekosaur, quchen: then I'm missing something in syntax since I get parse errors... all examples I see are of function defs :(
quchen 2017-02-05 22:23:30
> case Just 1 of y | Just x <- y -> x -- magthe
lambdabot 2017-02-05 22:23:32
1
magthe 2017-02-05 22:24:41
quchen: ah, it was that vertical bar that was missing! thanks!
halogenandtoast 2017-02-05 22:25:04
arne_: the way oop, functional, and other paradigm languages are written is all different. Sure you can use lambdas in non functional languages but that will only get you so far and at the end of the day if that's how you want to write you programs you're missing out on a language the fully supports that style
halogenandtoast 2017-02-05 22:25:10
anyways back to my encoding problem!
halogenandtoast 2017-02-05 22:25:14
I must figure this out
halogenandtoast 2017-02-05 22:25:23
I just want to replace some stupid text in my string
quchen 2017-02-05 22:25:28
arne_: I know of no language with a »patched in« type system that works. C++ templates are completely broken, Java Generics are completely broken.
reactormonk 2017-02-05 22:25:42
Do I violate the lens laws if I have a prism that lenses to a left/right depending on the input on a set?
reactormonk 2017-02-05 22:26:21
... yes, put put.
quchen 2017-02-05 22:28:10
magthe: Pattern *guards* :-)
cocreature 2017-02-05 22:28:21
halogenandtoast: do the japanese characters appear correctly in GHCi? when I set LANG=C I just get question marks whereas they appear correctly with LANG=en_US.UTF-8
halogenandtoast 2017-02-05 22:28:53
putStrLn "\23470"
halogenandtoast 2017-02-05 22:28:54
halogenandtoast 2017-02-05 22:29:01
that's my output
halogenandtoast 2017-02-05 22:29:05
they seem to appear correctly
Xnuk 2017-02-05 22:29:15
geekosaur 2017-02-05 22:29:35
so maybe it's your system's regcomp etc. that don't properly support utf8
geekosaur 2017-02-05 22:30:14
you may need to install a UTF8 aware PCRE and use pcre-light or the Text.Regex PCRE module
halogenandtoast 2017-02-05 22:30:32
I don't know what regcomp is but I'd be surprised if OSX messed this up.
ongy 2017-02-05 22:30:47
doesn't osx use a 16bit encoding?
geekosaur 2017-02-05 22:30:52
it's the POSIX regular expression compiler function (goes along with regexec which actually does matching, etc.)
halogenandtoast 2017-02-05 22:30:57
ongy: probably
halogenandtoast 2017-02-05 22:31:20
I have no knowledge providing me with a way to determine that
geekosaur 2017-02-05 22:31:22
and, OS X is not guaranteed to get them right *for the BSD API*, only for Aqua / CoreFoundation and friends
magthe 2017-02-05 22:33:23
quchen: true, but I don't think I've even seen a 'case' that does more than pattern match and therefore leaves out the '|'
quchen 2017-02-05 22:36:11
magthe: A hack that one can use instead of -XMultiWayIf is »case () of _ | p1 -> … | p2 -> …«
quchen 2017-02-05 22:36:38
A bit noisy, but does the trick
geekosaur 2017-02-05 22:38:23
halogenandtoast, another alternative if the POSIX regex functions are broken is to use Text.Regex.TDFA
geekosaur 2017-02-05 22:38:34
it's slower but pure haskell and guaranteed to handle utf8 properly
geekosaur 2017-02-05 22:38:49
(provided locale is correct for feeding data into / out of your program)
geekosaur 2017-02-05 22:39:34
(regex-tdfa package if needed)
halogenandtoast 2017-02-05 22:40:49
okay I'll try that.
magthe 2017-02-05 22:45:00
quchen: ah, nice!