vctrh 2017-03-04 03:49:50
hey
martinga 2017-03-04 03:54:59
I've written a library for accessing patent data from the EPO, and a client using it to download PDFs. I think it is worth putting out there on Hackage for others, but it is my first significant bit of code. What's the best way to ask the community for a quick review / sanity check? Library is at https://github.com/fros1y/patent-api and the tool is at https://github.com/fros1y/epo-download
lyxia 2017-03-04 03:56:45
You could try the Haskell-café ML, or reddit.
hpc 2017-03-04 03:58:24
ML? in #haskell?
hodapp 2017-03-04 03:59:50
it's a big conspiracy, don't worry about it
lyxia 2017-03-04 04:00:36
Machine Learning is the future.
martinga 2017-03-04 04:01:01
It's Sneaky ML. You know, SML.
vctrh 2017-03-04 04:06:20
there are a few of us are doing machine learning in haskell
hodapp 2017-03-04 04:12:43
there's DataHaskell on Gitter
hodapp 2017-03-04 04:12:55
ML libraries in Haskell seem to be rough though
hodapp 2017-03-04 04:13:32
I haven't attempted to use HLearn yet but looked through some examples, and there are a couple convnet things too that I'd like to look at
hodapp 2017-03-04 04:14:30
but every time I use R, I think about how this could be made SO MUCH CLEANER by using actual types instead of a lot of monkey-patching and duct-tape
hodapp 2017-03-04 04:15:20
I suppose HaskellR is still a thing.
vctrh 2017-03-04 04:17:17
hodapp yeah it could be so much better than R
vctrh 2017-03-04 04:17:35
hlearn is not very active
hodapp 2017-03-04 04:17:55
it just seems to be the most active :|
vctrh 2017-03-04 04:18:21
https://github.com/mikeizbicki/HLearn/graphs/contributors
vctrh 2017-03-04 04:18:27
kind of stalled for the last year+
vctrh 2017-03-04 04:18:51
i think datahaskell kind of has to build things up anew
hodapp 2017-03-04 04:19:09
yeah. I'd like to help with some of that but just haven't had time as of late
hodapp 2017-03-04 04:19:25
someone had a dataframe-like API (not Frames) that looked promising
vctrh 2017-03-04 04:19:33
as a lot of the one-man attempts at this stuff have mostly fizzled
vctrh 2017-03-04 04:20:00
was this from the other day? i haven't seen anything too promising in that arena yet
hodapp 2017-03-04 04:20:12
it was from some months ago
hodapp 2017-03-04 04:20:18
http://hackage.haskell.org/package/analyze
hodapp 2017-03-04 04:20:53
if we had something that could handle most of the data-juggling stuff from the Hadleyverse that would go amazingly far
hodapp 2017-03-04 04:21:03
e.g. reshape2, dplyr and friends
vctrh 2017-03-04 04:21:14
trying to find a tutorial or example of this...
vctrh 2017-03-04 04:21:36
actually i had an idea in the dataframe space i'm experimenting with
hodapp 2017-03-04 04:21:54
yeah?
vctrh 2017-03-04 04:22:01
may share soon but have to convince myself its not a dead end first
hodapp 2017-03-04 04:23:31
are you in the dataHaskell gitter? may get the most interest there
hodapp 2017-03-04 04:23:40
dataframes are a thing I'd have a hard time living without in some fashion
vctrh 2017-03-04 04:23:46
yeah i'm over there
vctrh 2017-03-04 04:24:02
well even though i'm working on this data frame experiment
vctrh 2017-03-04 04:24:09
i'm not convinced it's the way to go
jchia_ 2017-03-04 04:24:15
jophish: I'm trying to increment an element in a Data.Vector.Sized.Vector 20 Int. What's the simplest way to do it? With regular vector, I just use Data.Vector.modify with Data.Vector.Mutable.modify. The context is I'm using Data.Vector.Size.Vector as a histogram and I need to increment a bucket for each sample I encounter.
vctrh 2017-03-04 04:24:17
maybe haskell needs its own abstraction
hodapp 2017-03-04 04:24:47
it's not a rigidly-defined abstraction either. R does it one way, Python another with Pandas, and Scala another still in Spark
vctrh 2017-03-04 04:24:48
rather than transplanting R/Pandas verbatim. i keep thinking there's maybe a lens-y substitute for dataframe
hodapp 2017-03-04 04:25:06
doesn't Frames do something along these lines?
vctrh 2017-03-04 04:25:11
i'm not sure what that would look like but maybe someone does
vctrh 2017-03-04 04:25:30
I don't think Frames is it :/
vctrh 2017-03-04 04:25:41
it was a good try but the ergonomics of working with it are too problematic
hodapp 2017-03-04 04:25:50
I haven't worked much with it yet
vctrh 2017-03-04 04:26:01
doesn't scale to unwieldy CSVs (which are eveyrwhere in data science)
vctrh 2017-03-04 04:26:22
or rather, doesn't handle unwieldy data gracefully
hodapp 2017-03-04 04:26:48
but whatever it is, I'd love if certain things that are sort of a royal pain in R - like doing transformations that turn a row into multiple rows but maybe variable numbers of them - were much more direct
vctrh 2017-03-04 04:26:51
for example if you have a few hundred columns or more, compile times get not so good
hodapp 2017-03-04 04:27:01
hmmm
vctrh 2017-03-04 04:27:09
also try loading in 2 or 3 CSVs and you'll inevitably have a namespace clash
vctrh 2017-03-04 04:27:16
with the accessors being derived by column names
vctrh 2017-03-04 04:27:46
it was a good try, but i think the template haskell route is *too* static an approach
Rembane 2017-03-04 04:28:08
What about going the Data.Map approach? It's not as Haskelly, but it could work.
Rembane 2017-03-04 04:28:24
You might need some prefixes for the keys though.
vctrh 2017-03-04 04:28:26
Rembane it kind of works. I've rolled my own data analyses around map and vector
vctrh 2017-03-04 04:28:43
the problem is you can write your own joins and stuff
vctrh 2017-03-04 04:28:48
but it gets to be slow
hodapp 2017-03-04 04:28:53
whatever it was, if it's going to be usable the way dataframes are in many languages, would have to be something where it's trivial to add new columns based on existing columns to do quick computations and such
hodapp 2017-03-04 04:29:09
and where things like melt/cast aren't monstrosities
Rembane 2017-03-04 04:29:15
vctrh: It is indeed. Hm...
vctrh 2017-03-04 04:29:15
medium size-ish data sets that you'd left_join and filter like nothing in R
vctrh 2017-03-04 04:29:34
start to become substantial obstacles with Map representations
vctrh 2017-03-04 04:31:38
Rembane i'm experimenting with a pretty different approach to the problem now
Rembane 2017-03-04 04:32:37
vctrh: What's your current approach?
Rembane 2017-03-04 04:32:44
vctrh: Or rather, the different approach?
vctrh 2017-03-04 04:34:08
well might be premature to say because it could also be a dead end ... but the experiment i'm trying is to wrap a dplyr-like DSL around an in-memory sqlite database
bollu 2017-03-04 04:34:40
I'm reading lens over tea, I don't understand this sentence: "Now let's consider 2 cases – the first is when we use it to turn s into a, the second is when we use the iso to turn b into t. The definition we already have works well enough for the former case, but in the latter case sdoesn't even exist" Link: https://artyom.me/lens-over-tea-4. Could someone please tell me what it is trying to say?
vctrh 2017-03-04 04:35:05
give relational operations to the relational gods. then couple that with some bridges to standard haskell data structures like Map and IO operations (like CSVs)
vctrh 2017-03-04 04:36:00
kind of flips the dplyr sql database backend model on its head
Rembane 2017-03-04 04:36:41
That does indeed sound interesting, I'm afraid I don't have any intuition on if it is a sound design.
bollu 2017-03-04 04:37:56
OMG lens is ungoogleable
vctrh 2017-03-04 04:38:26
neither do i. i think a question is how cleanly one can write an interpreter to sqlite. i would've tried building it on opaleye but i'm not sure if it's actively maintained
rdococ 2017-03-04 04:38:46
asdf
bollu 2017-03-04 04:38:55
how do you keep a message for someone? I forgot
bollu 2017-03-04 04:38:58
using lambdabot
vctrh 2017-03-04 04:39:12
so i'm trying to do it from scratch. no obvious reason why it wouldn't work but it's still an experiment at this stage.
vctrh 2017-03-04 04:40:19
alternatively, one could always use the many sql DSLs but i think they tend to be just slightly too low level for data analysis/data munging stuff.
bollu 2017-03-04 04:40:35
@msg ekmett "thanks for 'use' :) how would I have discovered it if it wasn't for your help?"
lambdabot 2017-03-04 04:40:36
Not enough privileges
bollu 2017-03-04 04:40:38
hm
Rembane 2017-03-04 04:40:42
vctrh: You could build your EDSL on top of a SQL EDSL.
bollu 2017-03-04 04:40:46
I had the same problem sometime back
bollu 2017-03-04 04:40:56
@tell ekmett "thanks for 'use' :) how would I have discovered it if it wasn't for your help?"
lambdabot 2017-03-04 04:40:56
Consider it noted.
vctrh 2017-03-04 04:41:26
Rembane like i said I'd use opaleye if I was confident their sqlite dsl wasn't trending towards abandonment
vctrh 2017-03-04 04:41:33
for now i'm building it on sqlite-simple
Rembane 2017-03-04 04:41:36
vctrh: Ah, good point.