zipper 2017-03-03 09:45:32
Hey, does anyone here know about the text lib and how it handles or doesn't handle unicode when packing and unpacking?
zipper 2017-03-03 09:45:32
e.g
zipper 2017-03-03 09:45:35
Data.Text.unpack $ Data.Text.pack "уч"
zipper 2017-03-03 09:45:39
> Data.Text.unpack $ Data.Text.pack "уч"
lambdabot 2017-03-03 09:45:43
error:
lambdabot 2017-03-03 09:45:43
Not in scope: 'Data.Text.unpack'
lambdabot 2017-03-03 09:45:43
No module named 'Data.Text' is imported.error:
zipper 2017-03-03 09:46:09
Anywho that should be isomorphic but doesn't seem to be
Tuplanolla 2017-03-03 09:47:25
@let import qualified Data.Text as Text
lambdabot 2017-03-03 09:47:27
Defined.
Tuplanolla 2017-03-03 09:47:28
> let x = "уч" in Text.unpack (Text.pack x) == x
lambdabot 2017-03-03 09:47:31
True
geekosaur 2017-03-03 09:47:34
> T.unpack $ T.pack "уч"
lambdabot 2017-03-03 09:47:38
error:
lambdabot 2017-03-03 09:47:38
Not in scope: 'T.unpack'
lambdabot 2017-03-03 09:47:38
Perhaps you meant 'BS.unpack' (imported from Data.ByteString)
geekosaur 2017-03-03 09:47:49
of course we couldn't be sensible enough to do that :/
zipper 2017-03-03 09:48:02
> Text.unpack $ Text.pack "уч"
lambdabot 2017-03-03 09:48:06
"\1091\1095"
zipper 2017-03-03 09:48:10
You see
Tuplanolla 2017-03-03 09:48:15
Nope.
zipper 2017-03-03 09:48:16
Not isomophic
Tuplanolla 2017-03-03 09:48:26
> "уч"
lambdabot 2017-03-03 09:48:29
"\1091\1095"
zipper 2017-03-03 09:48:31
I just read http://stackoverflow.com/questions/24953125/how-to-convert-unicode-escape-sequence-to-unicode-string-in-haskell
zipper 2017-03-03 09:48:57
"If you putStrLn or write to any other handle, the result will display the proper unicode string"
zipper 2017-03-03 09:48:59
hmmmmm
geekosaur 2017-03-03 09:49:31
> text $ Text.unpack $ Text.pack "уч"
lambdabot 2017-03-03 09:49:40
уч
geekosaur 2017-03-03 09:49:41
nothing to do with Data.Text; this is Show
geekosaur 2017-03-03 09:49:42
on String
geekosaur 2017-03-03 09:50:05
(pedantically, showList @Char)
zipper 2017-03-03 09:50:34
hmmmm I didn't know about text
monochrom 2017-03-03 09:50:36
zipper: It is a great lesson to learn that WYS is not WYG.
zipper 2017-03-03 09:50:36
:t text
lambdabot 2017-03-03 09:50:38
String -> Doc
zipper 2017-03-03 09:50:55
monochrom: WYS? WYG?
monochrom 2017-03-03 09:51:04
You heard of WYSIWYG?
geekosaur 2017-03-03 09:51:07
it's a hack, a orettyprinter combinator whose Show instance is literal (violating expectations for Show...)
AbstractLion 2017-03-03 09:51:15
What you see is not what you get
geekosaur 2017-03-03 09:52:11
basically Show is expected to produce sometyhing that would be valid Haskell source code from a maximally compatible character set, so pretty much everything above U+007E gets escaped
zipper 2017-03-03 09:52:45
Oh yeah
geekosaur 2017-03-03 09:53:15
(and below U+0020)
monochrom 2017-03-03 09:53:45
WYSIWYG is something we tell end-users, and we strive to code up things to live up to it, because we want to make things easy for end-users and help them get things done.
monochrom 2017-03-03 09:54:27
But we programmers deal with crossing abstraction boundaries and differing representations and differing renders and this and that. WYSIWYG is untrue for us.
monochrom 2017-03-03 09:55:29
If you see two different renderings, it doesn't mean that they have different backing data.
monochrom 2017-03-03 09:56:12
If you see two same renderings, it still doesn't mean that they have identical backing data.
geekosaur 2017-03-03 09:56:18
..and unicode is especially evil there
geekosaur 2017-03-03 09:56:18
unless you enforce normalization
monochrom 2017-03-03 09:57:10
For the backing data, trust only a hex editor.
zipper 2017-03-03 09:57:23
Thinking about how this Doc type will typecheck
monochrom 2017-03-03 09:57:31
Even REPLs and editors can lie.
geekosaur 2017-03-03 09:58:01
you don;t need it most likely; we use it in lambdabot because you can't putStr / putStrLn
geekosaur 2017-03-03 09:58:05
it's a hack
monochrom 2017-03-03 09:58:20
Well, not lie, but they do various translations that may be convenient for one purpose but unfaithful for all others.
zipper 2017-03-03 09:58:36
I also don't want to putStrLn because it'll print
geekosaur 2017-03-03 09:58:41
the only thing you need to do in a program, usually, is avoid use of show on String or Text
geekosaur 2017-03-03 09:58:50
what areyou doing with the data?
geekosaur 2017-03-03 09:59:06
more precisely, what are you doing that might make use of show / showsPrec?
zipper 2017-03-03 09:59:13
I want to do something with the string
geekosaur 2017-03-03 09:59:26
that is a non-answer
zipper 2017-03-03 09:59:36
geSorry
zipper 2017-03-03 09:59:40
geekosaur: Sorry
geekosaur 2017-03-03 09:59:47
I think you still are not getting it, you still believe that eithger String or Text is causing this
geekosaur 2017-03-03 09:59:54
neither is. *Show* is causing it.
zipper 2017-03-03 09:59:56
I'm I'm fetching HTML page metadata
geekosaur 2017-03-03 10:00:34
make the Show go away.
zipper 2017-03-03 10:00:34
I'm not using show
zipper 2017-03-03 10:00:34
Oh
geekosaur 2017-03-03 10:00:34
then something you are doing is
zipper 2017-03-03 10:00:47
Yea the lib tagsoup probably is
zipper 2017-03-03 10:00:57
https://hackage.haskell.org/package/tagsoup-0.14.1/docs/Text-HTML-TagSoup.html
zipper 2017-03-03 10:01:31
`renderTagsOptions renderOptions{optEscape = id} $ parseTags "&& Слу - YouTube "`
zipper 2017-03-03 10:01:44
It is printing in the repl
geekosaur 2017-03-03 10:01:56
ghci is using Show behind your back
geekosaur 2017-03-03 10:02:00
do not trust its output
monochrom 2017-03-03 10:02:16
zipper 2017-03-03 10:02:28
hmmm in the end the program will end up printing this so it'll use show, right?
geekosaur 2017-03-03 10:02:38
(specifically, it does `print it` after evaluation --- and print = putStrLn . show
geekosaur 2017-03-03 10:02:59
only if you use `print` instead of `putStrLn` or whatever
monochrom 2017-03-03 10:03:02
In the end it's your program and you choose putStr vs print.
zipper 2017-03-03 10:03:26
monochrom: hmmm
zipper 2017-03-03 10:03:32
It's not a big program
zipper 2017-03-03 10:03:34
One sec
monochrom 2017-03-03 10:04:09
Use hPutStr and save it to a file and use a hex editor to examine it.
zipper 2017-03-03 10:04:30
It's an IRC bot https://github.com/nairobilug/nairobi-bot/blob/master/src/Bot/URL.hs
zipper 2017-03-03 10:04:51
Some dudes using cyrillic text got so pissed
zipper 2017-03-03 10:05:05
I'll find out what is using the show
monochrom 2017-03-03 10:05:24
And the real story gets more complicated that just hPutStr
monochrom 2017-03-03 10:05:40
Because there is the question of "is it UTF-8 or what?"
monochrom 2017-03-03 10:06:28
IIRC the text library has a function for "translate Text to UTF-8 bytestring". This is preferred. Send the bytestring to file or stdout.
monochrom 2017-03-03 10:07:15
Do not even bother with String because String's putStr/hPutStr is locale-dependent and that's a whole can of worm there.
zipper 2017-03-03 10:07:15
It is utf-8 because we get a unicode codepoint, no?
monochrom 2017-03-03 10:07:24
No.
monochrom 2017-03-03 10:07:46
Text is not already utf-8. Text does its own thing.
monochrom 2017-03-03 10:08:00
I can tell you what it is, but it is only going to harm you.
zipper 2017-03-03 10:08:12
monochrom: How so?
monochrom 2017-03-03 10:08:18
Better think of it as abstract and/or "may change arbitrarily tomorrow"
zipper 2017-03-03 10:08:29
lol ok
monochrom 2017-03-03 10:08:51
Because it is actually irrelevant to input and output formats.
Clint 2017-03-03 10:09:27
zipper: do not use Data.ByteString.Char8
monochrom 2017-03-03 10:09:38
Notice that every I/O function of text is explicitly documented with what format it uses.
monochrom 2017-03-03 10:10:02
And there is none that says "I use Text's internal storage format verbatim".
monochrom 2017-03-03 10:10:09
So you may as well not care.
monochrom 2017-03-03 10:11:03
String is the opposite. String tells you the internal storage format (32-bit int per Char) but the I/O format is practically unspecified.
monochrom 2017-03-03 10:11:21
(Well, "locale-dependent". Does that tell you anything?)
zipper 2017-03-03 10:13:06
Yes it does
monochrom 2017-03-03 10:13:21
OK, what does that tell you?
zipper 2017-03-03 10:13:40
If I had set a locale to something, maybe Russian
zipper 2017-03-03 10:13:44
It would have worked
zipper 2017-03-03 10:13:51
But mine is en_US
monochrom 2017-03-03 10:14:06
Which encoding for the Russian? KOI? UTF-8? UTF-16LE?
Clint 2017-03-03 10:14:42
it would not have worked, because you are using Data.ByteString.Char8 to convert to String
zipper 2017-03-03 10:14:56
I tried `Data.ByteString.Char8.unpack $ Data.Text.Encoding.encodeUtf8 $ Text.pack "уч"` nothing
zipper 2017-03-03 10:15:06
Although Clint did say not to use Char8
Clint 2017-03-03 10:15:09
zipper: do not use Data.ByteString.Char8
monochrom 2017-03-03 10:15:11
Yeah, cut the Char8.
zipper 2017-03-03 10:15:12
monochrom: idk what encoding it is
monochrom 2017-03-03 10:15:49
There you go. It doesn't tell you anything.
zipper 2017-03-03 10:15:57
monochrom: It doesn't help the result though
zipper 2017-03-03 10:16:02
Dropping the Char8
zipper 2017-03-03 10:16:15
I think I should take a nap and look at this with fresh eyes
monochrom 2017-03-03 10:16:24
In fact why unpack at all?
bitf 2017-03-03 10:16:27
hey guys, i have a question from the book, Learn You a Haskell:
zipper 2017-03-03 10:16:30
I believe tagsoup is doing soemthing fishy
bitf 2017-03-03 10:16:35
instance Monad Maybe where
bitf 2017-03-03 10:16:35
return x = Just x
bitf 2017-03-03 10:16:35
Nothing >>= f = Nothing
bitf 2017-03-03 10:16:35
Just x >>= f = f x
bitf 2017-03-03 10:16:38
fail _ = Nothing
bitf 2017-03-03 10:16:51
this is in the section on monads
bitf 2017-03-03 10:17:06
does the just x line make sense
monochrom 2017-03-03 10:17:11
Do you trust me? If so, screw bytestring unpack, just use Data.ByteString.hPutStr directly.
zipper 2017-03-03 10:17:17
monochrom: I assumed unpack is the only way it would stop being a bytestring and be a string
zipper 2017-03-03 10:17:30
monochrom: I sure do trust you
monochrom 2017-03-03 10:17:34
No, screw String altogether, that's what I'm saying and you proved.
bitf 2017-03-03 10:17:34
shouldn't it be Just x >>= f = Just f x
Sornaensis 2017-03-03 10:17:35
bitf: which one
Sornaensis 2017-03-03 10:17:46
:t (>>=)
lambdabot 2017-03-03 10:17:48
Monad m => m a -> (a -> m b) -> m b
Sornaensis 2017-03-03 10:18:00
bitf: notice the type of the second parameter
bitf 2017-03-03 10:18:09
ok
zipper 2017-03-03 10:18:25
:t Data.ByteString.hPutStr
lambdabot 2017-03-03 10:18:27
GHC.IO.Handle.Types.Handle -> BSC.ByteString -> IO ()
bitf 2017-03-03 10:19:01
so the Just is implicit in the f...?
zipper 2017-03-03 10:19:07
I really don't want something that returns a unit
zipper 2017-03-03 10:19:11
I need this string
monochrom 2017-03-03 10:19:12
Remember WYS is not WYG? If you go through String and System.IO, you are incurring more of that, not less. You are incurring more mutilating translations.
Sornaensis 2017-03-03 10:19:28
bitf: f is a function that will return a (Maybe b)
Sornaensis 2017-03-03 10:19:42
when you give it an a
zipper 2017-03-03 10:20:30
monochrom: :) let me take a small break
zipper 2017-03-03 10:20:42
I'm going to make sense of this scrollback
monochrom 2017-03-03 10:20:53
You already have the Text version and the ByteString version. If you pass them as parameters to Data.ByteString.hPutStr, it does not mean that you will lose the original Text or the original Bytestring. You understand that?
zipper 2017-03-03 10:21:17
Oh I see with Data.ByteString.hPutStr I write to the handle and read from it right after
zipper 2017-03-03 10:21:26
*write to the file
zipper 2017-03-03 10:21:34
Sounds like it would blow up at scale
zipper 2017-03-03 10:21:42
file descriptors and all
zipper 2017-03-03 10:21:59
monochrom: I do now
monochrom 2017-03-03 10:22:24
OK I give up. https://xkcd.com/763/
bitf 2017-03-03 10:22:24
ah yes i understand now. thanks for help Sornaensis
zipper 2017-03-03 10:24:27
monochrom: LOL I'm kinda sleep deprived
zipper 2017-03-03 10:24:35
I'll look at it in a few
lpaste_ 2017-03-03 10:25:11
f-a pasted "Could not match type `m` with `m1`" at http://lpaste.net/353191
zipper 2017-03-03 10:26:03
monochrom: I believe tagsoup forces me to go through String or at least something they call StringLike
zipper 2017-03-03 10:26:28
"A class to generalise TagSoup parsing over many types of string-like types. "
mauke 2017-03-03 10:27:11
f-a: that's a different type
mauke 2017-03-03 10:27:26
your signature creates a new type variable 'm'
mauke 2017-03-03 10:27:33
unrelated to the 'm' in run's type
f-a 2017-03-03 10:28:10
thanks mauke. Any way to tell ghc "hey they are the same"?
f-a 2017-03-03 10:28:58
I can of course omit to write them, but with them it is easier to read the code
mauke 2017-03-03 10:29:13
look into ScopedTypeVariables
geekosaur 2017-03-03 10:29:19
f-a, ScopedTypeVariables and use `forall m.` in the signature
f-a 2017-03-03 10:29:42
mhh I had {-# Language ScopedTypeVariables #-}
f-a 2017-03-03 10:29:45
not the forall though
geekosaur 2017-03-03 10:30:00
right, you need the forall to tell it *which* type variables to scope-extend
f-a 2017-03-03 10:30:53
forall m t i o. (MonadIO m, Integral t) did it
f-a 2017-03-03 10:31:01
thanks mauke & geekosaur
d0t 2017-03-03 10:34:08
ohai. Does intero work with .hsc files?
zipper 2017-03-03 10:39:27
monochrom: Ok seems to me I could solve this simply by setting locale. However given that it already is set to UTF-8 it should just work. It is utf-8 in that "en_US.UTF-8"
Clint 2017-03-03 10:40:01
zipper: are you missing that Data.ByteString.Char8 breaks anything but ASCII?
zipper 2017-03-03 10:40:30
Ch3ck: I won't use a bytestring
zipper 2017-03-03 10:40:40
hmmm
zipper 2017-03-03 10:40:57
I read a bytestring over the network though
zipper 2017-03-03 10:40:59
FML
Clint 2017-03-03 10:41:49
you read a bytestring over the network. you write a bytestring over the network.
zipper 2017-03-03 10:43:08
Clint: I agree