From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/28620 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: kiran kumar Newsgroups: gmane.text.pandoc Subject: Re: Unable to generate citations in markdown_strict Date: Wed, 16 Jun 2021 14:53:02 -0700 (PDT) Message-ID: <1906411e-f239-4fb3-bc83-a279b167d101n@googlegroups.com> References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_1626_1608183560.1623880382969" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38154"; mail-complaints-to="usenet@ciao.gmane.io" To: pandoc-discuss Original-X-From: pandoc-discuss+bncBCL5BRWPUIPRBQHFVGDAMGQETCXZK6Q-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Wed Jun 16 23:53:07 2021 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-oi1-f188.google.com ([209.85.167.188]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1ltdT8-0009hK-GT for gtp-pandoc-discuss@m.gmane-mx.org; Wed, 16 Jun 2021 23:53:06 +0200 Original-Received: by mail-oi1-f188.google.com with SMTP id h67-20020aca53460000b02901f9abff7c53sf1825466oib.1 for ; Wed, 16 Jun 2021 14:53:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20161025; h=sender:date:from:to:message-id:in-reply-to:references:subject :mime-version:x-original-sender:reply-to:precedence:mailing-list :list-id:list-post:list-help:list-archive:list-subscribe :list-unsubscribe; bh=iL9Nlg0P5jUlJsOWCmJayylYslt/npDStbSrfkhU7bk=; b=mAieAwuy8FJoiqeB0hwNDU6TFcsgmtgW2B2S1i6hlPoWJNUDGbo337bL59Xwpt50ZU AU8Six4ipahxHYvTOBPmL0GCebKqOnGqq+tm4G+QuD19FDZd4Hy8EFsOyBQzybFRmGyo 3gxOk494SWbsRuPwpXNtpLI7ctNwl4CXO3CdFi+9drxUw+a0nVtEdXM8TUIq/KpS0MmH poGUijRFpeBnf5aGqQslE32QaIIP7Oz3EBcecAggfCZZED26gap6WzqUnUWsdAUF5wtQ X9JXHl251yjcM+XqJz8R9o9xwpK/UYWaz4qNZpOaTtoPnAD4i+DrHLRAjHeEQogEqlDG AkCg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:message-id:in-reply-to:references:subject:mime-version :x-original-sender:reply-to:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-subscribe:list-unsubscribe; bh=iL9Nlg0P5jUlJsOWCmJayylYslt/npDStbSrfkhU7bk=; b=fi0W2uYaS3O3/8aWD8FQPwsxzbyvZtpKEqaj3Jeab22GlVqr7PCPQmYtReqybOHHGK PVkBpkyvPhgahHLxRyW0b5qiqYK/1uaphxk5KU8XikbF2zAcDqNnFIH52gQrg455XCgd ysycCLMjnuTNSPUEcs3aZYPaH67CRhKJFPDCny55q8bLxmjvI4mtMTUrkRSiEy/Mjjiu LPgXkz5BpKwcZPouSd6I3WOOr29F9MkMlZFcNGJVFVPtmlhukDeK2gn0T3nI66Wesgg3 bRnEcuutzs+d7tRT88NObByjxC8P3Mau2vzqNhMQZPOqjvl6p9KaUuYG1RsSLVxvuZPj 4ZBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=sender:x-gm-message-state:date:from:to:message-id:in-reply-to :references:subject:mime-version:x-original-sender:reply-to :precedence:mailing-list:list-id:x-spam-checked-in-group:list-post :list-help:list-archive:list-subscribe:list-unsubscribe; bh=iL9Nlg0P5jUlJsOWCmJayylYslt/npDStbSrfkhU7bk=; b=L+2PI+gqUrjDwPF8/gZKetLyUPJaOvQryhKJYxB2mJgw0Ng49zby/Sl6ou9eJB/low 3DnOnaG1znNFq6YcqI8lDfGhCNu/gDdgFxXsX/uG8O0qyf2NVzW8FYTuPiTKWdQneNZC P18M/HDHPsPv8PxRru3/NYpQeZyp3EzpXkOVgNPgjFsIMr7cfLOilw6NSulnN38PpGmQ zuQpYtX5CaGcKgCtEZhN4Dlhk+QNXKq+L+KVWkaPLZHJW71mOnHh8rpiWYbVZYAXr3mv sfVgLgX8QOcYVgFwobECkoaq+PzLlQABLFYQweVQuvZATUrcYqMKWL0dDG1jR4sxtwEk ZLVQ== Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: AOAM531YfsEoH58NRCW0LGALoqx2sHGyJJ8hLPR5YTxwogh80hKbcd+l aVaA4nzwzcyBQ7EnTvyrTl4= X-Google-Smtp-Source: ABdhPJx/J7hoIfc7BFuclkImS1DjWsDBrKeNhnBRAwOcyRZYpVrhvjVVQTuYHPeS8viBRCvNYk5c1w== X-Received: by 2002:a9d:2aa8:: with SMTP id e37mr1751230otb.220.1623880385419; Wed, 16 Jun 2021 14:53:05 -0700 (PDT) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a54:4610:: with SMTP id p16ls1237695oip.2.gmail; Wed, 16 Jun 2021 14:53:03 -0700 (PDT) X-Received: by 2002:a05:6808:1511:: with SMTP id u17mr8581253oiw.53.1623880383567; Wed, 16 Jun 2021 14:53:03 -0700 (PDT) In-Reply-To: X-Original-Sender: krankumar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:28620 Archived-At: ------=_Part_1626_1608183560.1623880382969 Content-Type: multipart/alternative; boundary="----=_Part_1627_1722371844.1623880382969" ------=_Part_1627_1722371844.1623880382969 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I am also trying to add math in this format within the markdown $$=20 Experimentalquestion: \underbrace{ \begin{cases} H: statistic_{misrepresentation} \neq=20 statistic_{intentional} H0: statistic_{misrepresentation} =3D statistic_{intentional}=20 end{cases}=20 \text{verify truth of both statements}}=20 {\text{equality/inequality with an acceptable margin of statistical error}}= =20 $$ This is not rendered as expected. Can you guide me on getting this format= =20 to work?=20 On Wednesday, 16 June 2021 at 12:34:43 UTC-7 John MacFarlane wrote: > > markdown_strict doesn't support citations (that's an extension). > Try markdown. > > kiran kumar writes: > > >=20 > > > > Using the following command to generate citations > > > > pandoc test.md -citeproc -f markdown_strict+yaml_metadata_block -t=20 > > markdown_strict+citations+smart+yaml_metadata_block -s --bibliography= =20 > > blog.bib --csl acm.csl -o check.md > > > > The test.md has a few citations but it is not rendered as references in= =20 > the=20 > > check.md > > > > Is there something I am missing? > > > > --=20 > > You received this message because you are subscribed to the Google=20 > Groups "pandoc-discuss" group. > > To unsubscribe from this group and stop receiving emails from it, send= =20 > an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org > > To view this discussion on the web visit=20 > https://groups.google.com/d/msgid/pandoc-discuss/f0b66f7d-b530-4c2e-9979-= 2fd40ff51dd4n%40googlegroups.com > . > > --- > > bibliography: blog.bib > > csl: acm.csl > > date: "2020-10-128T20:20:00Z" > > draft: true > > title: Process of Data science - Measurement > > --- > > > > # Measurement variables > > > > In a previous post, the process of data science and forming an > > hypothesis is discussed. A hypothesis is the relevant to align a > > business objective to a data science problem. The hypothesis provides a > > "big-picture" view of the issues which need to considered in further > > steps of addressing a data science problem. > > > > The problem being considered is insurance fraud, and a good hypothesis > > for success could be =E2=80=9Cmisrepresentation is different from inten= tional > > damage=E2=80=9D. This hypothesis attempts to differentiate between > > misrepresentation and intentional damage. > > > >> Misrepresentiation is said to occur when a claim is made on > >> nonexistent assets > >> > >> Intentional damage is said to occur when an insured asset is > >> intentionally damaged > > > > The next step after an hypothesis is established is to consider > > variables or factors affecting the hypothesis. > > > > 1. [Hypothesis](http://knkumar.com/blog/posts/data_science_process/) > > 2. Measurement variables (discussed here) > > 3. Latent or unobservable factors > > 4. Experimental design (0 to 1) > > 1. Controlling other factors to observe primary effect. > > 5. Collection and analysis of data for pattern discovery > > 1. Hypothesis driven Exploration > > 6. Modeling of patterns for prediction > > 1. Numerical Analysis for error reduction > > 2. Qualitative modeling > > 7. Generalizing or scaling the experiment (1 to n) > > 8. Establishing a baseline > > 9. Monitoring through controls and baselines > > 10. Ethics and governance > > > > ## The null Hypothesis > > > > Let us call our hypothesis =E2=80=9Cmisrepresentation is different from > > intentional damage=E2=80=9D - $H$ for mathematical convenience. This ca= n be a > > hard thing to determine and we can use ideas from *statistical testing* > > to develop a solution. A statistical testing process works by > > determining an antithesis often called the null hypothesis, i.e., if th= e > > antithesis were true the hypothesis under consideration would not be > > true. An antithesis could be "misrepresentation is indifferentiable fro= m > > intentional damage", call this $H\_0$. > > > > In a traditional scientific experiment, a statistical experiment would > > be possible by random assignment to conditions under test. In this > > scenario, one group of insured would generate misrepresentation whereas > > another group would generate intentional damage claims. Traditional > > hypothesis testing would calculate a statistic, say a mean, for data > > generated from two groups and observe if statistic is significantly > > different from each other. $$ Experimental\\ question: > > \\underbrace{\\begin{cases} H: statistic*{misrepresentation}\\neq > > statistic*{intentional}\\ H*0: statistic*{misrepresentation} =3D > > statistic*{intentional} \\end{cases} \\text{verify truth of both > > statements}}* {\\text{equality/inequality with an acceptable margin of > > statistical error}} $$ In this scenario, misrepresentation and > > intentional damange are not randomly assigned or generated from insured > > parties. In fact, it would be facetious to conduct an experiment to > > study the problem at hand. Such a problem falls under the umbrella of a > > natural experiment or observational study depending on the circles you > > are in. > > > > In an observational study the assignment of population to groups or > > conditions of the experiment are outside the investigator's purview. A > > hypothesis such as "smoking causes cancer" or "video games cause > > violence" [@engelhardt2011your] is harder to perform in a pure > > scientific manner. In fact, the earlier position on video games by > > [@engelhardt2011your] has been attributed to priming by > > [@kuhn2019does] and the jury could still be out on this since we > > cannot guarantee homogenity of the sample in testing for observed > > effects. In such scenarios the best we can do are observational studies > > to gain more information about our hypothesis. > > > > ## What are Measurement Variables (aka Direct Factors)? > > > > In order to perform a *scientific study*, a data scientist should start > > by picking up on *signals* of misrepresentation and intentional damage. > > These signals are often referred to as measurement variables for > > modeling. The model of choice for such a problem is a discriminative > > model, i.e., a model discriminating fraud of misrepresentation and > > intentional damage. In the old but popular example of discriminating th= e > > iris species [@fisher1936use], the petal length/width and sepal > > length/width provided sufficient measurement variables for > > discrimination of the species using linear functions. In this iris > > analysis, the experiment was natural, i.e., not in the control of an > > experimenter. > > > > The term ***natural*** means the experimenter did not genetically modif= y > > the species to show variations, the variation in the species was > > naturally selected. On the other hand, in cases such as experiments wit= h > > [fruit flies](https://bdsc.indiana.edu/about/index.html) (available at > > Indiana University for research), a scientist would study the species b= y > > "knocking out genes" or "inducing variations" creating a *controlled* > > experiment. The key in either case would be understanding the *factors* > > or **measurement variables** for the hypothesis under study. > > > > A **natural/observational experiment** is a useful alternative when a > > controlled experiment cannot be undertaken like the insurance example. > > It is important to note that a natural experiment can also have issues > > regarding confounding variables and bias which potentially invalidate > > the experiment. > > > > A ***confound*** (or confounding variable) can be defined as a factor > > which could directly or indirectly affect the response variable when > > considering a direct measurement. Let's take a concrete example here to > > understand this concept. Assume a scout is looking for talent in > > basketball (or a VC firm is scouting for investment, the analogy is > > similar). The scout assesses the talent using a few metrics such as > > average points per game, assists for offense and rebounds, block, steal= s > > for defense. There are *other aspects* (or confounds) which come into > > the purview of a scout, such as medical history and > > stability/improvement of stats because these indicate the progression o= f > > a player and future outcomes. In many cases, a *confound* plays a large > > role. For example, a player with a debilitating shoulder injury could b= e > > a red flag since the future outcome could be weaker with a higher > > probability. The difficulty would be in ascertaining confounds for the > > hypothesis under study, and requires understanding the true nature of > > the effect a confound has on the hypothesis. A *targeted interview* wit= h > > an expert (such as claims investigator for insurance or talent scout fo= r > > sports) is a valuable tool in a data scientists arsenal to understand > > the factors and confounds which should be considered as data to be > > included in a model. An interview provides the intuition or priors in a > > bayesian context for data gathering and evaluation. > > > > A variable or factor discriminating ***misrepresentation*** from > > ***intentional damage*** could be identified based on multiple > > perspectives. Personally, I choose the word perspective as a line of > > attack/strategy to understand the contributing factors from first > > principles. This is a preferred approach, in my opinion, to throwing th= e > > kitchen sink at a dataset. > > > > #### Historical variables > > > > Historical variables can be obtained from similar category of claims in > > the past. They are useful in understanding patterns of normal insurance > > claims and misrepresentation. Cost per type of damage could be a genera= l > > factor to monitor, which needs categorizing types of damage available i= n > > historical data. In many cases, the insurance system would place > > restrictions on type of damages covered and bundle similar damages unde= r > > a large umbrella (because its easier to deal with one type and have a > > single process). For example flooding could be due to natural events > > like weather (rain, storm, waves, etc) or a pipe breaking due to stress > > or damage. Classifying the category at the right level is important in > > order to provide models the right level of information, not focusing on > > data driven approaches when collecting data can *misclassify* labels by > > not having appropriate levels for a category losing a lot of context. > > > > #### Textual variables > > > > Textual variables can be obtained from an insurance claim which asks > > pointed questions to a claimant. Many of the responses to the questions > > can be free form text or speech which allow representation of the > > situation in the claim. A misrepresented claim can potentially have > > signals in the text to describe the situation. Simple constructs would > > be overuse of certain elements to provide validity to the claim. A > > speech pattern can have inflection when misrepresenting facts which can > > be captured by a model. > > > > Another common pattern to obtain signals be asking the same question > > with a different phrase. Text or speech patterns for both questions > > should ideally be the similar and a measure of dissimilarity can be use= d > > by a model to discriminate between misrepresentation and intentional > > damage. The details of spacing between the questions and phrasing are > > experimental variables at the hands of the data scientist to gather > > useful signals. > > > > #### Social variables > > > > Social variables can be obtained from aspects of social interaction suc= h > > as association to similar groups, participation in similar events or > > mining social media sites such as Facebook, Twitter, Snapchat etc. The > > usage of social variables stem from the phrase - "neurons that are fire > > together wire together" implying that if there is a person who filed a > > claim with misrepresentation or intentional damage another person could > > be correlated to do so through social bonds. > > > > Personally, I am not a proponent of using social variables but in some > > cases they can provide useful information akin to a prior for the model= . > > A data scientist needs to be careful in ensuring the prior or social > > variables can be overcome by evidence in either direction. > > > > #### Economic variables > > > > ## Identifying measurement variables > > > > ### Correlation > > > > ### Separation of classes > > # articles for reinforcement learning > > @article{vinyals2017starcraft, > > title=3D{Starcraft ii: A new challenge for reinforcement learning}, > > author=3D{Vinyals, Oriol and Ewalds, Timo and Bartunov, Sergey and=20 > Georgiev, Petko and Vezhnevets, Alexander Sasha and Yeo, Michelle and=20 > Makhzani, Alireza and K{\"u}ttler, Heinrich and Agapiou, John and=20 > Schrittwieser, Julian and others}, > > journal=3D{arXiv preprint arXiv:1708.04782}, > > url=3D{https://arxiv.org/pdf/1708.04782}, > > year=3D{2017} > > } > > @article{dulac2019challenges, > > title=3D{Challenges of real-world reinforcement learning}, > > author=3D{Dulac-Arnold, Gabriel and Mankowitz, Daniel and Hester, Todd}= , > > journal=3D{arXiv preprint arXiv:1904.12901}, > > url=3D{https://arxiv.org/pdf/1904.12901}, > > year=3D{2019} > > } > > > > # articles on data science > > @article{engelhardt2011your, > > title=3D{This is your brain on violent video games: Neural desensitizat= ion=20 > to violence predicts increased aggression following violent video game=20 > exposure}, > > author=3D{Engelhardt, Christopher R and Bartholow, Bruce D and Kerr,=20 > Geoffrey T and Bushman, Brad J}, > > journal=3D{Journal of Experimental Social Psychology}, > > volume=3D{47}, > > number=3D{5}, > > pages=3D{1033--1036}, > > year=3D{2011}, > > url=3D{https://hal.archives-ouvertes.fr/peer-00995254/document}, > > publisher=3D{Elsevier} > > } > > @article{kuhn2019does, > > title=3D{Does playing violent video games cause aggression? A longitudi= nal=20 > intervention study}, > > author=3D{K{\"u}hn, Simone and Kugler, Dimitrij Tycho and Schmalen,=20 > Katharina and Weichenberger, Markus and Witt, Charlotte and Gallinat,=20 > J{\"u}rgen}, > > journal=3D{Molecular psychiatry}, > > volume=3D{24}, > > number=3D{8}, > > pages=3D{1220--1234}, > > year=3D{2019}, > > url=3D{https://www.nature.com/articles/s41380-018-0031-7}, > > publisher=3D{Nature Publishing Group} > > } > > @article{fisher1936use, > > title=3D{The use of multiple measurements in taxonomic problems}, > > author=3D{Fisher, Ronald A}, > > journal=3D{Annals of eugenics}, > > volume=3D{7}, > > number=3D{2}, > > pages=3D{179--188}, > > year=3D{1936}, > > url=3D{ > https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-1809.1936.tb02137.= x > }, > > publisher=3D{Wiley Online Library} > > } > --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/1906411e-f239-4fb3-bc83-a279b167d101n%40googlegroups.com. ------=_Part_1627_1722371844.1623880382969 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I am also trying to add math in this format within the markdown
$$=  
Experimentalquestion:
\underbrace{
&nb= sp;   \begin{cases} H: statistic_{misrepresentation} \neq statistic_{i= ntentional}
     H0: statistic_{misrepresentation}= =3D statistic_{intentional} 
end{cases} 
\te= xt{verify truth of both statements}} 
{\text{equality/inequa= lity with an acceptable margin of statistical error}} 
$$
This is not rendered as expected. Can you guide me on get= ting this format to work? 
On Wednesday, 16 June 2021 at 12:34:43 UTC-7 Jo= hn MacFarlane wrote:

markdown_strict doesn't support citations (that's an extension)= .
Try markdown.

kiran kumar <kran...@gmai= l.com> writes:

> =20
>
> Using the following command to generate citations
>
> pandoc test.md -citeproc -f markdown_strict+yaml_metadata_block -= t=20
> markdown_strict+citations+smart+yaml_metadata_block -s --bibliogra= phy=20
> blog.bib --csl acm.csl -o check.md
>
> The test.md has a few citations but it is not rendered as referenc= es in the=20
> check.md
>
> Is there something I am missing?
>
> --=20
> You received this message because you are subscribed to the Google= Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, = send an email to pandoc-discus..= .@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f= 0b66f7d-b530-4c2e-9979-2fd40ff51dd4n%40googlegroups.com.
> ---
> bibliography: blog.bib
> csl: acm.csl
> date: "2020-10-128T20:20:00Z"
> draft: true
> title: Process of Data science - Measurement
> ---
>
> # Measurement variables
>
> In a previous post, the process of data science and forming an
> hypothesis is discussed. A hypothesis is the relevant to align a
> business objective to a data science problem. The hypothesis provi= des a
> "big-picture" view of the issues which need to considere= d in further
> steps of addressing a data science problem.
>
> The problem being considered is insurance fraud, and a good hypoth= esis
> for success could be =E2=80=9Cmisrepresentation is different from = intentional
> damage=E2=80=9D. This hypothesis attempts to differentiate between
> misrepresentation and intentional damage.
>
>> Misrepresentiation is said to occur when a claim is made on
>> nonexistent assets
>>
>> Intentional damage is said to occur when an insured asset is
>> intentionally damaged
>
> The next step after an hypothesis is established is to consider
> variables or factors affecting the hypothesis.
>
> 1. [Hypothesis](http://knkumar.com/blog/posts/data_s= cience_process/)
> 2. Measurement variables (discussed here)
> 3. Latent or unobservable factors
> 4. Experimental design (0 to 1)
> 1. Controlling other factors to observe primary effect.
> 5. Collection and analysis of data for pattern discovery
> 1. Hypothesis driven Exploration
> 6. Modeling of patterns for prediction
> 1. Numerical Analysis for error reduction
> 2. Qualitative modeling
> 7. Generalizing or scaling the experiment (1 to n)
> 8. Establishing a baseline
> 9. Monitoring through controls and baselines
> 10. Ethics and governance
>
> ## The null Hypothesis
>
> Let us call our hypothesis =E2=80=9Cmisrepresentation is different= from
> intentional damage=E2=80=9D - $H$ for mathematical convenience. Th= is can be a
> hard thing to determine and we can use ideas from *statistical tes= ting*
> to develop a solution. A statistical testing process works by
> determining an antithesis often called the null hypothesis, i.e., = if the
> antithesis were true the hypothesis under consideration would not = be
> true. An antithesis could be "misrepresentation is indifferen= tiable from
> intentional damage", call this $H\_0$.
>
> In a traditional scientific experiment, a statistical experiment w= ould
> be possible by random assignment to conditions under test. In this
> scenario, one group of insured would generate misrepresentation wh= ereas
> another group would generate intentional damage claims. Traditiona= l
> hypothesis testing would calculate a statistic, say a mean, for da= ta
> generated from two groups and observe if statistic is significantl= y
> different from each other. $$ Experimental\\ question:
> \\underbrace{\\begin{cases} H: statistic*{misrepresentation}\\neq
> statistic*{intentional}\\ H*0: statistic*{misrepresentation} =3D
> statistic*{intentional} \\end{cases} \\text{verify truth of both
> statements}}* {\\text{equality/inequality with an acceptable margi= n of
> statistical error}} $$ In this scenario, misrepresentation and
> intentional damange are not randomly assigned or generated from in= sured
> parties. In fact, it would be facetious to conduct an experiment t= o
> study the problem at hand. Such a problem falls under the umbrella= of a
> natural experiment or observational study depending on the circles= you
> are in.
>
> In an observational study the assignment of population to groups o= r
> conditions of the experiment are outside the investigator's pu= rview. A
> hypothesis such as "smoking causes cancer" or "vide= o games cause
> violence" [@engelhardt2011your] is harder to perform in a pur= e
> scientific manner. In fact, the earlier position on video games by
> [@engelhardt2011your] has been attributed to priming by
> [@kuhn2019does] and the jury could still be out on this since we
> cannot guarantee homogenity of the sample in testing for observed
> effects. In such scenarios the best we can do are observational st= udies
> to gain more information about our hypothesis.
>
> ## What are Measurement Variables (aka Direct Factors)?
>
> In order to perform a *scientific study*, a data scientist should = start
> by picking up on *signals* of misrepresentation and intentional da= mage.
> These signals are often referred to as measurement variables for
> modeling. The model of choice for such a problem is a discriminati= ve
> model, i.e., a model discriminating fraud of misrepresentation and
> intentional damage. In the old but popular example of discriminati= ng the
> iris species [@fisher1936use], the petal length/width and sepal
> length/width provided sufficient measurement variables for
> discrimination of the species using linear functions. In this iris
> analysis, the experiment was natural, i.e., not in the control of = an
> experimenter.
>
> The term ***natural*** means the experimenter did not genetically = modify
> the species to show variations, the variation in the species was
> naturally selected. On the other hand, in cases such as experiment= s with
> [fruit flies](https://bdsc.indiana.edu/about/index.html) (available a= t
> Indiana University for research), a scientist would study the spec= ies by
> "knocking out genes" or "inducing variations" = creating a *controlled*
> experiment. The key in either case would be understanding the *fac= tors*
> or **measurement variables** for the hypothesis under study.
>
> A **natural/observational experiment** is a useful alternative whe= n a
> controlled experiment cannot be undertaken like the insurance exam= ple.
> It is important to note that a natural experiment can also have is= sues
> regarding confounding variables and bias which potentially invalid= ate
> the experiment.
>
> A ***confound*** (or confounding variable) can be defined as a fac= tor
> which could directly or indirectly affect the response variable wh= en
> considering a direct measurement. Let's take a concrete exampl= e here to
> understand this concept. Assume a scout is looking for talent in
> basketball (or a VC firm is scouting for investment, the analogy i= s
> similar). The scout assesses the talent using a few metrics such a= s
> average points per game, assists for offense and rebounds, block, = steals
> for defense. There are *other aspects* (or confounds) which come i= nto
> the purview of a scout, such as medical history and
> stability/improvement of stats because these indicate the progress= ion of
> a player and future outcomes. In many cases, a *confound* plays a = large
> role. For example, a player with a debilitating shoulder injury co= uld be
> a red flag since the future outcome could be weaker with a higher
> probability. The difficulty would be in ascertaining confounds for= the
> hypothesis under study, and requires understanding the true nature= of
> the effect a confound has on the hypothesis. A *targeted interview= * with
> an expert (such as claims investigator for insurance or talent sco= ut for
> sports) is a valuable tool in a data scientists arsenal to underst= and
> the factors and confounds which should be considered as data to be
> included in a model. An interview provides the intuition or priors= in a
> bayesian context for data gathering and evaluation.
>
> A variable or factor discriminating ***misrepresentation*** from
> ***intentional damage*** could be identified based on multiple
> perspectives. Personally, I choose the word perspective as a line = of
> attack/strategy to understand the contributing factors from first
> principles. This is a preferred approach, in my opinion, to throwi= ng the
> kitchen sink at a dataset.
>
> #### Historical variables
>
> Historical variables can be obtained from similar category of clai= ms in
> the past. They are useful in understanding patterns of normal insu= rance
> claims and misrepresentation. Cost per type of damage could be a g= eneral
> factor to monitor, which needs categorizing types of damage availa= ble in
> historical data. In many cases, the insurance system would place
> restrictions on type of damages covered and bundle similar damages= under
> a large umbrella (because its easier to deal with one type and hav= e a
> single process). For example flooding could be due to natural even= ts
> like weather (rain, storm, waves, etc) or a pipe breaking due to s= tress
> or damage. Classifying the category at the right level is importan= t in
> order to provide models the right level of information, not focusi= ng on
> data driven approaches when collecting data can *misclassify* labe= ls by
> not having appropriate levels for a category losing a lot of conte= xt.
>
> #### Textual variables
>
> Textual variables can be obtained from an insurance claim which as= ks
> pointed questions to a claimant. Many of the responses to the ques= tions
> can be free form text or speech which allow representation of the
> situation in the claim. A misrepresented claim can potentially hav= e
> signals in the text to describe the situation. Simple constructs w= ould
> be overuse of certain elements to provide validity to the claim. A
> speech pattern can have inflection when misrepresenting facts whic= h can
> be captured by a model.
>
> Another common pattern to obtain signals be asking the same questi= on
> with a different phrase. Text or speech patterns for both question= s
> should ideally be the similar and a measure of dissimilarity can b= e used
> by a model to discriminate between misrepresentation and intention= al
> damage. The details of spacing between the questions and phrasing = are
> experimental variables at the hands of the data scientist to gathe= r
> useful signals.
>
> #### Social variables
>
> Social variables can be obtained from aspects of social interactio= n such
> as association to similar groups, participation in similar events = or
> mining social media sites such as Facebook, Twitter, Snapchat etc.= The
> usage of social variables stem from the phrase - "neurons tha= t are fire
> together wire together" implying that if there is a person wh= o filed a
> claim with misrepresentation or intentional damage another person = could
> be correlated to do so through social bonds.
>
> Personally, I am not a proponent of using social variables but in = some
> cases they can provide useful information akin to a prior for the = model.
> A data scientist needs to be careful in ensuring the prior or soci= al
> variables can be overcome by evidence in either direction.
>
> #### Economic variables
>
> ## Identifying measurement variables
>
> ### Correlation
>
> ### Separation of classes
> # articles for reinforcement learning
> @article{vinyals2017starcraft,
> title=3D{Starcraft ii: A new challenge for reinforcement learnin= g},
> author=3D{Vinyals, Oriol and Ewalds, Timo and Bartunov, Sergey a= nd Georgiev, Petko and Vezhnevets, Alexander Sasha and Yeo, Michelle and Ma= khzani, Alireza and K{\"u}ttler, Heinrich and Agapiou, John and Schrit= twieser, Julian and others},
> journal=3D{arXiv preprint arXiv:1708.04782},
> url=3D{https:/= /arxiv.org/pdf/1708.04782},
> year=3D{2017}
> }
> @article{dulac2019challenges,
> title=3D{Challenges of real-world reinforcement learning},
> author=3D{Dulac-Arnold, Gabriel and Mankowitz, Daniel and Hester= , Todd},
> journal=3D{arXiv preprint arXiv:1904.12901},
> url=3D{https:/= /arxiv.org/pdf/1904.12901},
> year=3D{2019}
> }
>
> # articles on data science
> @article{engelhardt2011your,
> title=3D{This is your brain on violent video games: Neural desen= sitization to violence predicts increased aggression following violent vide= o game exposure},
> author=3D{Engelhardt, Christopher R and Bartholow, Bruce D and K= err, Geoffrey T and Bushman, Brad J},
> journal=3D{Journal of Experimental Social Psychology},
> volume=3D{47},
> number=3D{5},
> pages=3D{1033--1036},
> year=3D{2011},
> url=3D{https://hal.archives-ouvertes.fr/pee= r-00995254/document},
> publisher=3D{Elsevier}
> }
> @article{kuhn2019does,
> title=3D{Does playing violent video games cause aggression? A lo= ngitudinal intervention study},
> author=3D{K{\"u}hn, Simone and Kugler, Dimitrij Tycho and S= chmalen, Katharina and Weichenberger, Markus and Witt, Charlotte and Gallin= at, J{\"u}rgen},
> journal=3D{Molecular psychiatry},
> volume=3D{24},
> number=3D{8},
> pages=3D{1220--1234},
> year=3D{2019},
> url=3D{https://www.nature.com/articles/s41380-018-0031-= 7},
> publisher=3D{Nature Publishing Group}
> }
> @article{fisher1936use,
> title=3D{The use of multiple measurements in taxonomic problems}= ,
> author=3D{Fisher, Ronald A},
> journal=3D{Annals of eugenics},
> volume=3D{7},
> number=3D{2},
> pages=3D{179--188},
> year=3D{1936},
> url=3D{https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-1809.1936.tb02137.= x},
> publisher=3D{Wiley Online Library}
> }

--
You received this message because you are subscribed to the Google Groups &= quot;pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an e= mail to pand= oc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org.
To view this discussion on the web visit https://groups.google.com/d= /msgid/pandoc-discuss/1906411e-f239-4fb3-bc83-a279b167d101n%40googlegroups.= com.
------=_Part_1627_1722371844.1623880382969-- ------=_Part_1626_1608183560.1623880382969--