public inbox archive for pandoc-discuss@googlegroups.com
 help / color / mirror / Atom feed
* Unable to generate citations in markdown_strict
@ 2021-06-16 17:01 kiran kumar
       [not found] ` <f0b66f7d-b530-4c2e-9979-2fd40ff51dd4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: kiran kumar @ 2021-06-16 17:01 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 761 bytes --]

 

Using the following command to generate citations

pandoc test.md  -citeproc -f markdown_strict+yaml_metadata_block -t 
markdown_strict+citations+smart+yaml_metadata_block -s --bibliography 
blog.bib --csl acm.csl -o check.md

The test.md has a few citations but it is not rendered as references in the 
check.md

Is there something I am missing?

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f0b66f7d-b530-4c2e-9979-2fd40ff51dd4n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 1079 bytes --]

[-- Attachment #2: test.md --]
[-- Type: text/markdown, Size: 10299 bytes --]

---
bibliography: blog.bib
csl: acm.csl
date: "2020-10-128T20:20:00Z"
draft: true
title: Process of Data science - Measurement
---

# Measurement variables

In a previous post, the process of data science and forming an
hypothesis is discussed. A hypothesis is the relevant to align a
business objective to a data science problem. The hypothesis provides a
"big-picture" view of the issues which need to considered in further
steps of addressing a data science problem.

The problem being considered is insurance fraud, and a good hypothesis
for success could be “misrepresentation is different from intentional
damage”. This hypothesis attempts to differentiate between
misrepresentation and intentional damage.

> Misrepresentiation is said to occur when a claim is made on
> nonexistent assets
>
> Intentional damage is said to occur when an insured asset is
> intentionally damaged

The next step after an hypothesis is established is to consider
variables or factors affecting the hypothesis.

1.  [Hypothesis](http://knkumar.com/blog/posts/data_science_process/)
2.  Measurement variables (discussed here)
3.  Latent or unobservable factors
4.  Experimental design (0 to 1)
    1.  Controlling other factors to observe primary effect.
5.  Collection and analysis of data for pattern discovery
    1.  Hypothesis driven Exploration
6.  Modeling of patterns for prediction
    1.  Numerical Analysis for error reduction
    2.  Qualitative modeling
7.  Generalizing or scaling the experiment (1 to n)
8.  Establishing a baseline
9.  Monitoring through controls and baselines
10. Ethics and governance

## The null Hypothesis

Let us call our hypothesis “misrepresentation is different from
intentional damage” - $H$ for mathematical convenience. This can be a
hard thing to determine and we can use ideas from *statistical testing*
to develop a solution. A statistical testing process works by
determining an antithesis often called the null hypothesis, i.e., if the
antithesis were true the hypothesis under consideration would not be
true. An antithesis could be "misrepresentation is indifferentiable from
intentional damage", call this $H\_0$.

In a traditional scientific experiment, a statistical experiment would
be possible by random assignment to conditions under test. In this
scenario, one group of insured would generate misrepresentation whereas
another group would generate intentional damage claims. Traditional
hypothesis testing would calculate a statistic, say a mean, for data
generated from two groups and observe if statistic is significantly
different from each other. $$ Experimental\\ question:
\\underbrace{\\begin{cases} H: statistic*{misrepresentation}\\neq
statistic*{intentional}\\ H*0: statistic*{misrepresentation} =
statistic*{intentional} \\end{cases} \\text{verify truth of both
statements}}* {\\text{equality/inequality with an acceptable margin of
statistical error}} $$ In this scenario, misrepresentation and
intentional damange are not randomly assigned or generated from insured
parties. In fact, it would be facetious to conduct an experiment to
study the problem at hand. Such a problem falls under the umbrella of a
natural experiment or observational study depending on the circles you
are in.

In an observational study the assignment of population to groups or
conditions of the experiment are outside the investigator's purview. A
hypothesis such as "smoking causes cancer" or "video games cause
violence" [@engelhardt2011your] is harder to perform in a pure
scientific manner. In fact, the earlier position on video games by
[@engelhardt2011your] has been attributed to priming by
[@kuhn2019does] and the jury could still be out on this since we
cannot guarantee homogenity of the sample in testing for observed
effects. In such scenarios the best we can do are observational studies
to gain more information about our hypothesis.

## What are Measurement Variables (aka Direct Factors)?

In order to perform a *scientific study*, a data scientist should start
by picking up on *signals* of misrepresentation and intentional damage.
These signals are often referred to as measurement variables for
modeling. The model of choice for such a problem is a discriminative
model, i.e., a model discriminating fraud of misrepresentation and
intentional damage. In the old but popular example of discriminating the
iris species [@fisher1936use], the petal length/width and sepal
length/width provided sufficient measurement variables for
discrimination of the species using linear functions. In this iris
analysis, the experiment was natural, i.e., not in the control of an
experimenter.

The term ***natural*** means the experimenter did not genetically modify
the species to show variations, the variation in the species was
naturally selected. On the other hand, in cases such as experiments with
[fruit flies](https://bdsc.indiana.edu/about/index.html) (available at
Indiana University for research), a scientist would study the species by
"knocking out genes" or "inducing variations" creating a *controlled*
experiment. The key in either case would be understanding the *factors*
or **measurement variables** for the hypothesis under study.

A **natural/observational experiment** is a useful alternative when a
controlled experiment cannot be undertaken like the insurance example.
It is important to note that a natural experiment can also have issues
regarding confounding variables and bias which potentially invalidate
the experiment.

A ***confound*** (or confounding variable) can be defined as a factor
which could directly or indirectly affect the response variable when
considering a direct measurement. Let's take a concrete example here to
understand this concept. Assume a scout is looking for talent in
basketball (or a VC firm is scouting for investment, the analogy is
similar). The scout assesses the talent using a few metrics such as
average points per game, assists for offense and rebounds, block, steals
for defense. There are *other aspects* (or confounds) which come into
the purview of a scout, such as medical history and
stability/improvement of stats because these indicate the progression of
a player and future outcomes. In many cases, a *confound* plays a large
role. For example, a player with a debilitating shoulder injury could be
a red flag since the future outcome could be weaker with a higher
probability. The difficulty would be in ascertaining confounds for the
hypothesis under study, and requires understanding the true nature of
the effect a confound has on the hypothesis. A *targeted interview* with
an expert (such as claims investigator for insurance or talent scout for
sports) is a valuable tool in a data scientists arsenal to understand
the factors and confounds which should be considered as data to be
included in a model. An interview provides the intuition or priors in a
bayesian context for data gathering and evaluation.

A variable or factor discriminating ***misrepresentation*** from
***intentional damage*** could be identified based on multiple
perspectives. Personally, I choose the word perspective as a line of
attack/strategy to understand the contributing factors from first
principles. This is a preferred approach, in my opinion, to throwing the
kitchen sink at a dataset.

#### Historical variables

Historical variables can be obtained from similar category of claims in
the past. They are useful in understanding patterns of normal insurance
claims and misrepresentation. Cost per type of damage could be a general
factor to monitor, which needs categorizing types of damage available in
historical data. In many cases, the insurance system would place
restrictions on type of damages covered and bundle similar damages under
a large umbrella (because its easier to deal with one type and have a
single process). For example flooding could be due to natural events
like weather (rain, storm, waves, etc) or a pipe breaking due to stress
or damage. Classifying the category at the right level is important in
order to provide models the right level of information, not focusing on
data driven approaches when collecting data can *misclassify* labels by
not having appropriate levels for a category losing a lot of context.

#### Textual variables

Textual variables can be obtained from an insurance claim which asks
pointed questions to a claimant. Many of the responses to the questions
can be free form text or speech which allow representation of the
situation in the claim. A misrepresented claim can potentially have
signals in the text to describe the situation. Simple constructs would
be overuse of certain elements to provide validity to the claim. A
speech pattern can have inflection when misrepresenting facts which can
be captured by a model.

Another common pattern to obtain signals be asking the same question
with a different phrase. Text or speech patterns for both questions
should ideally be the similar and a measure of dissimilarity can be used
by a model to discriminate between misrepresentation and intentional
damage. The details of spacing between the questions and phrasing are
experimental variables at the hands of the data scientist to gather
useful signals.

#### Social variables

Social variables can be obtained from aspects of social interaction such
as association to similar groups, participation in similar events or
mining social media sites such as Facebook, Twitter, Snapchat etc. The
usage of social variables stem from the phrase - "neurons that are fire
together wire together" implying that if there is a person who filed a
claim with misrepresentation or intentional damage another person could
be correlated to do so through social bonds.

Personally, I am not a proponent of using social variables but in some
cases they can provide useful information akin to a prior for the model.
A data scientist needs to be careful in ensuring the prior or social
variables can be overcome by evidence in either direction.

#### Economic variables

## Identifying measurement variables

### Correlation

### Separation of classes

[-- Attachment #3: blog.bib --]
[-- Type: text/x-bibtex, Size: 2073 bytes --]

# articles for reinforcement learning
@article{vinyals2017starcraft,
  title={Starcraft ii: A new challenge for reinforcement learning},
  author={Vinyals, Oriol and Ewalds, Timo and Bartunov, Sergey and Georgiev, Petko and Vezhnevets, Alexander Sasha and Yeo, Michelle and Makhzani, Alireza and K{\"u}ttler, Heinrich and Agapiou, John and Schrittwieser, Julian and others},
  journal={arXiv preprint arXiv:1708.04782},
  url={https://arxiv.org/pdf/1708.04782},
  year={2017}
}
@article{dulac2019challenges,
  title={Challenges of real-world reinforcement learning},
  author={Dulac-Arnold, Gabriel and Mankowitz, Daniel and Hester, Todd},
  journal={arXiv preprint arXiv:1904.12901},
  url={https://arxiv.org/pdf/1904.12901},
  year={2019}
}

# articles on data science
@article{engelhardt2011your,
  title={This is your brain on violent video games: Neural desensitization to violence predicts increased aggression following violent video game exposure},
  author={Engelhardt, Christopher R and Bartholow, Bruce D and Kerr, Geoffrey T and Bushman, Brad J},
  journal={Journal of Experimental Social Psychology},
  volume={47},
  number={5},
  pages={1033--1036},
  year={2011},
  url={https://hal.archives-ouvertes.fr/peer-00995254/document},
  publisher={Elsevier}
}
@article{kuhn2019does,
  title={Does playing violent video games cause aggression? A longitudinal intervention study},
  author={K{\"u}hn, Simone and Kugler, Dimitrij Tycho and Schmalen, Katharina and Weichenberger, Markus and Witt, Charlotte and Gallinat, J{\"u}rgen},
  journal={Molecular psychiatry},
  volume={24},
  number={8},
  pages={1220--1234},
  year={2019},
  url={https://www.nature.com/articles/s41380-018-0031-7},
  publisher={Nature Publishing Group}
}
@article{fisher1936use,
  title={The use of multiple measurements in taxonomic problems},
  author={Fisher, Ronald A},
  journal={Annals of eugenics},
  volume={7},
  number={2},
  pages={179--188},
  year={1936},
  url={https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-1809.1936.tb02137.x},
  publisher={Wiley Online Library}
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unable to generate citations in markdown_strict
       [not found] ` <f0b66f7d-b530-4c2e-9979-2fd40ff51dd4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
@ 2021-06-16 17:19   ` Joseph Reagle
  2021-06-16 19:34   ` John MacFarlane
  1 sibling, 0 replies; 4+ messages in thread
From: Joseph Reagle @ 2021-06-16 17:19 UTC (permalink / raw)
  To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw

You have some odd things going on with your parameters (e.g., it's `-C` or `--citeproc`), but the following works for me when the from format is *not* strict.

```
pandoc test.md  --citeproc -f markdown+yaml_metadata_block -t markdown_strict+smart+yaml_metadata_block -s --bibliography blog.bib --csl acm-sigchi-proceedings.csl
```


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unable to generate citations in markdown_strict
       [not found] ` <f0b66f7d-b530-4c2e-9979-2fd40ff51dd4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
  2021-06-16 17:19   ` Joseph Reagle
@ 2021-06-16 19:34   ` John MacFarlane
       [not found]     ` <m21r9138os.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
  1 sibling, 1 reply; 4+ messages in thread
From: John MacFarlane @ 2021-06-16 19:34 UTC (permalink / raw)
  To: kiran kumar, pandoc-discuss


markdown_strict doesn't support citations (that's an extension).
Try markdown.

kiran kumar <krankumar-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

>  
>
> Using the following command to generate citations
>
> pandoc test.md  -citeproc -f markdown_strict+yaml_metadata_block -t 
> markdown_strict+citations+smart+yaml_metadata_block -s --bibliography 
> blog.bib --csl acm.csl -o check.md
>
> The test.md has a few citations but it is not rendered as references in the 
> check.md
>
> Is there something I am missing?
>
> -- 
> You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/f0b66f7d-b530-4c2e-9979-2fd40ff51dd4n%40googlegroups.com.
> ---
> bibliography: blog.bib
> csl: acm.csl
> date: "2020-10-128T20:20:00Z"
> draft: true
> title: Process of Data science - Measurement
> ---
>
> # Measurement variables
>
> In a previous post, the process of data science and forming an
> hypothesis is discussed. A hypothesis is the relevant to align a
> business objective to a data science problem. The hypothesis provides a
> "big-picture" view of the issues which need to considered in further
> steps of addressing a data science problem.
>
> The problem being considered is insurance fraud, and a good hypothesis
> for success could be “misrepresentation is different from intentional
> damage”. This hypothesis attempts to differentiate between
> misrepresentation and intentional damage.
>
>> Misrepresentiation is said to occur when a claim is made on
>> nonexistent assets
>>
>> Intentional damage is said to occur when an insured asset is
>> intentionally damaged
>
> The next step after an hypothesis is established is to consider
> variables or factors affecting the hypothesis.
>
> 1.  [Hypothesis](http://knkumar.com/blog/posts/data_science_process/)
> 2.  Measurement variables (discussed here)
> 3.  Latent or unobservable factors
> 4.  Experimental design (0 to 1)
>     1.  Controlling other factors to observe primary effect.
> 5.  Collection and analysis of data for pattern discovery
>     1.  Hypothesis driven Exploration
> 6.  Modeling of patterns for prediction
>     1.  Numerical Analysis for error reduction
>     2.  Qualitative modeling
> 7.  Generalizing or scaling the experiment (1 to n)
> 8.  Establishing a baseline
> 9.  Monitoring through controls and baselines
> 10. Ethics and governance
>
> ## The null Hypothesis
>
> Let us call our hypothesis “misrepresentation is different from
> intentional damage” - $H$ for mathematical convenience. This can be a
> hard thing to determine and we can use ideas from *statistical testing*
> to develop a solution. A statistical testing process works by
> determining an antithesis often called the null hypothesis, i.e., if the
> antithesis were true the hypothesis under consideration would not be
> true. An antithesis could be "misrepresentation is indifferentiable from
> intentional damage", call this $H\_0$.
>
> In a traditional scientific experiment, a statistical experiment would
> be possible by random assignment to conditions under test. In this
> scenario, one group of insured would generate misrepresentation whereas
> another group would generate intentional damage claims. Traditional
> hypothesis testing would calculate a statistic, say a mean, for data
> generated from two groups and observe if statistic is significantly
> different from each other. $$ Experimental\\ question:
> \\underbrace{\\begin{cases} H: statistic*{misrepresentation}\\neq
> statistic*{intentional}\\ H*0: statistic*{misrepresentation} =
> statistic*{intentional} \\end{cases} \\text{verify truth of both
> statements}}* {\\text{equality/inequality with an acceptable margin of
> statistical error}} $$ In this scenario, misrepresentation and
> intentional damange are not randomly assigned or generated from insured
> parties. In fact, it would be facetious to conduct an experiment to
> study the problem at hand. Such a problem falls under the umbrella of a
> natural experiment or observational study depending on the circles you
> are in.
>
> In an observational study the assignment of population to groups or
> conditions of the experiment are outside the investigator's purview. A
> hypothesis such as "smoking causes cancer" or "video games cause
> violence" [@engelhardt2011your] is harder to perform in a pure
> scientific manner. In fact, the earlier position on video games by
> [@engelhardt2011your] has been attributed to priming by
> [@kuhn2019does] and the jury could still be out on this since we
> cannot guarantee homogenity of the sample in testing for observed
> effects. In such scenarios the best we can do are observational studies
> to gain more information about our hypothesis.
>
> ## What are Measurement Variables (aka Direct Factors)?
>
> In order to perform a *scientific study*, a data scientist should start
> by picking up on *signals* of misrepresentation and intentional damage.
> These signals are often referred to as measurement variables for
> modeling. The model of choice for such a problem is a discriminative
> model, i.e., a model discriminating fraud of misrepresentation and
> intentional damage. In the old but popular example of discriminating the
> iris species [@fisher1936use], the petal length/width and sepal
> length/width provided sufficient measurement variables for
> discrimination of the species using linear functions. In this iris
> analysis, the experiment was natural, i.e., not in the control of an
> experimenter.
>
> The term ***natural*** means the experimenter did not genetically modify
> the species to show variations, the variation in the species was
> naturally selected. On the other hand, in cases such as experiments with
> [fruit flies](https://bdsc.indiana.edu/about/index.html) (available at
> Indiana University for research), a scientist would study the species by
> "knocking out genes" or "inducing variations" creating a *controlled*
> experiment. The key in either case would be understanding the *factors*
> or **measurement variables** for the hypothesis under study.
>
> A **natural/observational experiment** is a useful alternative when a
> controlled experiment cannot be undertaken like the insurance example.
> It is important to note that a natural experiment can also have issues
> regarding confounding variables and bias which potentially invalidate
> the experiment.
>
> A ***confound*** (or confounding variable) can be defined as a factor
> which could directly or indirectly affect the response variable when
> considering a direct measurement. Let's take a concrete example here to
> understand this concept. Assume a scout is looking for talent in
> basketball (or a VC firm is scouting for investment, the analogy is
> similar). The scout assesses the talent using a few metrics such as
> average points per game, assists for offense and rebounds, block, steals
> for defense. There are *other aspects* (or confounds) which come into
> the purview of a scout, such as medical history and
> stability/improvement of stats because these indicate the progression of
> a player and future outcomes. In many cases, a *confound* plays a large
> role. For example, a player with a debilitating shoulder injury could be
> a red flag since the future outcome could be weaker with a higher
> probability. The difficulty would be in ascertaining confounds for the
> hypothesis under study, and requires understanding the true nature of
> the effect a confound has on the hypothesis. A *targeted interview* with
> an expert (such as claims investigator for insurance or talent scout for
> sports) is a valuable tool in a data scientists arsenal to understand
> the factors and confounds which should be considered as data to be
> included in a model. An interview provides the intuition or priors in a
> bayesian context for data gathering and evaluation.
>
> A variable or factor discriminating ***misrepresentation*** from
> ***intentional damage*** could be identified based on multiple
> perspectives. Personally, I choose the word perspective as a line of
> attack/strategy to understand the contributing factors from first
> principles. This is a preferred approach, in my opinion, to throwing the
> kitchen sink at a dataset.
>
> #### Historical variables
>
> Historical variables can be obtained from similar category of claims in
> the past. They are useful in understanding patterns of normal insurance
> claims and misrepresentation. Cost per type of damage could be a general
> factor to monitor, which needs categorizing types of damage available in
> historical data. In many cases, the insurance system would place
> restrictions on type of damages covered and bundle similar damages under
> a large umbrella (because its easier to deal with one type and have a
> single process). For example flooding could be due to natural events
> like weather (rain, storm, waves, etc) or a pipe breaking due to stress
> or damage. Classifying the category at the right level is important in
> order to provide models the right level of information, not focusing on
> data driven approaches when collecting data can *misclassify* labels by
> not having appropriate levels for a category losing a lot of context.
>
> #### Textual variables
>
> Textual variables can be obtained from an insurance claim which asks
> pointed questions to a claimant. Many of the responses to the questions
> can be free form text or speech which allow representation of the
> situation in the claim. A misrepresented claim can potentially have
> signals in the text to describe the situation. Simple constructs would
> be overuse of certain elements to provide validity to the claim. A
> speech pattern can have inflection when misrepresenting facts which can
> be captured by a model.
>
> Another common pattern to obtain signals be asking the same question
> with a different phrase. Text or speech patterns for both questions
> should ideally be the similar and a measure of dissimilarity can be used
> by a model to discriminate between misrepresentation and intentional
> damage. The details of spacing between the questions and phrasing are
> experimental variables at the hands of the data scientist to gather
> useful signals.
>
> #### Social variables
>
> Social variables can be obtained from aspects of social interaction such
> as association to similar groups, participation in similar events or
> mining social media sites such as Facebook, Twitter, Snapchat etc. The
> usage of social variables stem from the phrase - "neurons that are fire
> together wire together" implying that if there is a person who filed a
> claim with misrepresentation or intentional damage another person could
> be correlated to do so through social bonds.
>
> Personally, I am not a proponent of using social variables but in some
> cases they can provide useful information akin to a prior for the model.
> A data scientist needs to be careful in ensuring the prior or social
> variables can be overcome by evidence in either direction.
>
> #### Economic variables
>
> ## Identifying measurement variables
>
> ### Correlation
>
> ### Separation of classes
> # articles for reinforcement learning
> @article{vinyals2017starcraft,
>   title={Starcraft ii: A new challenge for reinforcement learning},
>   author={Vinyals, Oriol and Ewalds, Timo and Bartunov, Sergey and Georgiev, Petko and Vezhnevets, Alexander Sasha and Yeo, Michelle and Makhzani, Alireza and K{\"u}ttler, Heinrich and Agapiou, John and Schrittwieser, Julian and others},
>   journal={arXiv preprint arXiv:1708.04782},
>   url={https://arxiv.org/pdf/1708.04782},
>   year={2017}
> }
> @article{dulac2019challenges,
>   title={Challenges of real-world reinforcement learning},
>   author={Dulac-Arnold, Gabriel and Mankowitz, Daniel and Hester, Todd},
>   journal={arXiv preprint arXiv:1904.12901},
>   url={https://arxiv.org/pdf/1904.12901},
>   year={2019}
> }
>
> # articles on data science
> @article{engelhardt2011your,
>   title={This is your brain on violent video games: Neural desensitization to violence predicts increased aggression following violent video game exposure},
>   author={Engelhardt, Christopher R and Bartholow, Bruce D and Kerr, Geoffrey T and Bushman, Brad J},
>   journal={Journal of Experimental Social Psychology},
>   volume={47},
>   number={5},
>   pages={1033--1036},
>   year={2011},
>   url={https://hal.archives-ouvertes.fr/peer-00995254/document},
>   publisher={Elsevier}
> }
> @article{kuhn2019does,
>   title={Does playing violent video games cause aggression? A longitudinal intervention study},
>   author={K{\"u}hn, Simone and Kugler, Dimitrij Tycho and Schmalen, Katharina and Weichenberger, Markus and Witt, Charlotte and Gallinat, J{\"u}rgen},
>   journal={Molecular psychiatry},
>   volume={24},
>   number={8},
>   pages={1220--1234},
>   year={2019},
>   url={https://www.nature.com/articles/s41380-018-0031-7},
>   publisher={Nature Publishing Group}
> }
> @article{fisher1936use,
>   title={The use of multiple measurements in taxonomic problems},
>   author={Fisher, Ronald A},
>   journal={Annals of eugenics},
>   volume={7},
>   number={2},
>   pages={179--188},
>   year={1936},
>   url={https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-1809.1936.tb02137.x},
>   publisher={Wiley Online Library}
> }

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/m21r9138os.fsf%40MacBook-Pro-2.hsd1.ca.comcast.net.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Unable to generate citations in markdown_strict
       [not found]     ` <m21r9138os.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
@ 2021-06-16 21:53       ` kiran kumar
  0 siblings, 0 replies; 4+ messages in thread
From: kiran kumar @ 2021-06-16 21:53 UTC (permalink / raw)
  To: pandoc-discuss


[-- Attachment #1.1: Type: text/plain, Size: 15319 bytes --]

I am also trying to add math in this format within the markdown
$$ 
Experimentalquestion:
\underbrace{
    \begin{cases} H: statistic_{misrepresentation} \neq 
statistic_{intentional}
     H0: statistic_{misrepresentation} = statistic_{intentional} 
end{cases} 
\text{verify truth of both statements}} 
{\text{equality/inequality with an acceptable margin of statistical error}} 
$$

This is not rendered as expected. Can you guide me on getting this format 
to work? 
On Wednesday, 16 June 2021 at 12:34:43 UTC-7 John MacFarlane wrote:

>
> markdown_strict doesn't support citations (that's an extension).
> Try markdown.
>
> kiran kumar <kran...-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> > 
> >
> > Using the following command to generate citations
> >
> > pandoc test.md -citeproc -f markdown_strict+yaml_metadata_block -t 
> > markdown_strict+citations+smart+yaml_metadata_block -s --bibliography 
> > blog.bib --csl acm.csl -o check.md
> >
> > The test.md has a few citations but it is not rendered as references in 
> the 
> > check.md
> >
> > Is there something I am missing?
> >
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "pandoc-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to pandoc-discus...-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/pandoc-discuss/f0b66f7d-b530-4c2e-9979-2fd40ff51dd4n%40googlegroups.com
> .
> > ---
> > bibliography: blog.bib
> > csl: acm.csl
> > date: "2020-10-128T20:20:00Z"
> > draft: true
> > title: Process of Data science - Measurement
> > ---
> >
> > # Measurement variables
> >
> > In a previous post, the process of data science and forming an
> > hypothesis is discussed. A hypothesis is the relevant to align a
> > business objective to a data science problem. The hypothesis provides a
> > "big-picture" view of the issues which need to considered in further
> > steps of addressing a data science problem.
> >
> > The problem being considered is insurance fraud, and a good hypothesis
> > for success could be “misrepresentation is different from intentional
> > damage”. This hypothesis attempts to differentiate between
> > misrepresentation and intentional damage.
> >
> >> Misrepresentiation is said to occur when a claim is made on
> >> nonexistent assets
> >>
> >> Intentional damage is said to occur when an insured asset is
> >> intentionally damaged
> >
> > The next step after an hypothesis is established is to consider
> > variables or factors affecting the hypothesis.
> >
> > 1. [Hypothesis](http://knkumar.com/blog/posts/data_science_process/)
> > 2. Measurement variables (discussed here)
> > 3. Latent or unobservable factors
> > 4. Experimental design (0 to 1)
> > 1. Controlling other factors to observe primary effect.
> > 5. Collection and analysis of data for pattern discovery
> > 1. Hypothesis driven Exploration
> > 6. Modeling of patterns for prediction
> > 1. Numerical Analysis for error reduction
> > 2. Qualitative modeling
> > 7. Generalizing or scaling the experiment (1 to n)
> > 8. Establishing a baseline
> > 9. Monitoring through controls and baselines
> > 10. Ethics and governance
> >
> > ## The null Hypothesis
> >
> > Let us call our hypothesis “misrepresentation is different from
> > intentional damage” - $H$ for mathematical convenience. This can be a
> > hard thing to determine and we can use ideas from *statistical testing*
> > to develop a solution. A statistical testing process works by
> > determining an antithesis often called the null hypothesis, i.e., if the
> > antithesis were true the hypothesis under consideration would not be
> > true. An antithesis could be "misrepresentation is indifferentiable from
> > intentional damage", call this $H\_0$.
> >
> > In a traditional scientific experiment, a statistical experiment would
> > be possible by random assignment to conditions under test. In this
> > scenario, one group of insured would generate misrepresentation whereas
> > another group would generate intentional damage claims. Traditional
> > hypothesis testing would calculate a statistic, say a mean, for data
> > generated from two groups and observe if statistic is significantly
> > different from each other. $$ Experimental\\ question:
> > \\underbrace{\\begin{cases} H: statistic*{misrepresentation}\\neq
> > statistic*{intentional}\\ H*0: statistic*{misrepresentation} =
> > statistic*{intentional} \\end{cases} \\text{verify truth of both
> > statements}}* {\\text{equality/inequality with an acceptable margin of
> > statistical error}} $$ In this scenario, misrepresentation and
> > intentional damange are not randomly assigned or generated from insured
> > parties. In fact, it would be facetious to conduct an experiment to
> > study the problem at hand. Such a problem falls under the umbrella of a
> > natural experiment or observational study depending on the circles you
> > are in.
> >
> > In an observational study the assignment of population to groups or
> > conditions of the experiment are outside the investigator's purview. A
> > hypothesis such as "smoking causes cancer" or "video games cause
> > violence" [@engelhardt2011your] is harder to perform in a pure
> > scientific manner. In fact, the earlier position on video games by
> > [@engelhardt2011your] has been attributed to priming by
> > [@kuhn2019does] and the jury could still be out on this since we
> > cannot guarantee homogenity of the sample in testing for observed
> > effects. In such scenarios the best we can do are observational studies
> > to gain more information about our hypothesis.
> >
> > ## What are Measurement Variables (aka Direct Factors)?
> >
> > In order to perform a *scientific study*, a data scientist should start
> > by picking up on *signals* of misrepresentation and intentional damage.
> > These signals are often referred to as measurement variables for
> > modeling. The model of choice for such a problem is a discriminative
> > model, i.e., a model discriminating fraud of misrepresentation and
> > intentional damage. In the old but popular example of discriminating the
> > iris species [@fisher1936use], the petal length/width and sepal
> > length/width provided sufficient measurement variables for
> > discrimination of the species using linear functions. In this iris
> > analysis, the experiment was natural, i.e., not in the control of an
> > experimenter.
> >
> > The term ***natural*** means the experimenter did not genetically modify
> > the species to show variations, the variation in the species was
> > naturally selected. On the other hand, in cases such as experiments with
> > [fruit flies](https://bdsc.indiana.edu/about/index.html) (available at
> > Indiana University for research), a scientist would study the species by
> > "knocking out genes" or "inducing variations" creating a *controlled*
> > experiment. The key in either case would be understanding the *factors*
> > or **measurement variables** for the hypothesis under study.
> >
> > A **natural/observational experiment** is a useful alternative when a
> > controlled experiment cannot be undertaken like the insurance example.
> > It is important to note that a natural experiment can also have issues
> > regarding confounding variables and bias which potentially invalidate
> > the experiment.
> >
> > A ***confound*** (or confounding variable) can be defined as a factor
> > which could directly or indirectly affect the response variable when
> > considering a direct measurement. Let's take a concrete example here to
> > understand this concept. Assume a scout is looking for talent in
> > basketball (or a VC firm is scouting for investment, the analogy is
> > similar). The scout assesses the talent using a few metrics such as
> > average points per game, assists for offense and rebounds, block, steals
> > for defense. There are *other aspects* (or confounds) which come into
> > the purview of a scout, such as medical history and
> > stability/improvement of stats because these indicate the progression of
> > a player and future outcomes. In many cases, a *confound* plays a large
> > role. For example, a player with a debilitating shoulder injury could be
> > a red flag since the future outcome could be weaker with a higher
> > probability. The difficulty would be in ascertaining confounds for the
> > hypothesis under study, and requires understanding the true nature of
> > the effect a confound has on the hypothesis. A *targeted interview* with
> > an expert (such as claims investigator for insurance or talent scout for
> > sports) is a valuable tool in a data scientists arsenal to understand
> > the factors and confounds which should be considered as data to be
> > included in a model. An interview provides the intuition or priors in a
> > bayesian context for data gathering and evaluation.
> >
> > A variable or factor discriminating ***misrepresentation*** from
> > ***intentional damage*** could be identified based on multiple
> > perspectives. Personally, I choose the word perspective as a line of
> > attack/strategy to understand the contributing factors from first
> > principles. This is a preferred approach, in my opinion, to throwing the
> > kitchen sink at a dataset.
> >
> > #### Historical variables
> >
> > Historical variables can be obtained from similar category of claims in
> > the past. They are useful in understanding patterns of normal insurance
> > claims and misrepresentation. Cost per type of damage could be a general
> > factor to monitor, which needs categorizing types of damage available in
> > historical data. In many cases, the insurance system would place
> > restrictions on type of damages covered and bundle similar damages under
> > a large umbrella (because its easier to deal with one type and have a
> > single process). For example flooding could be due to natural events
> > like weather (rain, storm, waves, etc) or a pipe breaking due to stress
> > or damage. Classifying the category at the right level is important in
> > order to provide models the right level of information, not focusing on
> > data driven approaches when collecting data can *misclassify* labels by
> > not having appropriate levels for a category losing a lot of context.
> >
> > #### Textual variables
> >
> > Textual variables can be obtained from an insurance claim which asks
> > pointed questions to a claimant. Many of the responses to the questions
> > can be free form text or speech which allow representation of the
> > situation in the claim. A misrepresented claim can potentially have
> > signals in the text to describe the situation. Simple constructs would
> > be overuse of certain elements to provide validity to the claim. A
> > speech pattern can have inflection when misrepresenting facts which can
> > be captured by a model.
> >
> > Another common pattern to obtain signals be asking the same question
> > with a different phrase. Text or speech patterns for both questions
> > should ideally be the similar and a measure of dissimilarity can be used
> > by a model to discriminate between misrepresentation and intentional
> > damage. The details of spacing between the questions and phrasing are
> > experimental variables at the hands of the data scientist to gather
> > useful signals.
> >
> > #### Social variables
> >
> > Social variables can be obtained from aspects of social interaction such
> > as association to similar groups, participation in similar events or
> > mining social media sites such as Facebook, Twitter, Snapchat etc. The
> > usage of social variables stem from the phrase - "neurons that are fire
> > together wire together" implying that if there is a person who filed a
> > claim with misrepresentation or intentional damage another person could
> > be correlated to do so through social bonds.
> >
> > Personally, I am not a proponent of using social variables but in some
> > cases they can provide useful information akin to a prior for the model.
> > A data scientist needs to be careful in ensuring the prior or social
> > variables can be overcome by evidence in either direction.
> >
> > #### Economic variables
> >
> > ## Identifying measurement variables
> >
> > ### Correlation
> >
> > ### Separation of classes
> > # articles for reinforcement learning
> > @article{vinyals2017starcraft,
> > title={Starcraft ii: A new challenge for reinforcement learning},
> > author={Vinyals, Oriol and Ewalds, Timo and Bartunov, Sergey and 
> Georgiev, Petko and Vezhnevets, Alexander Sasha and Yeo, Michelle and 
> Makhzani, Alireza and K{\"u}ttler, Heinrich and Agapiou, John and 
> Schrittwieser, Julian and others},
> > journal={arXiv preprint arXiv:1708.04782},
> > url={https://arxiv.org/pdf/1708.04782},
> > year={2017}
> > }
> > @article{dulac2019challenges,
> > title={Challenges of real-world reinforcement learning},
> > author={Dulac-Arnold, Gabriel and Mankowitz, Daniel and Hester, Todd},
> > journal={arXiv preprint arXiv:1904.12901},
> > url={https://arxiv.org/pdf/1904.12901},
> > year={2019}
> > }
> >
> > # articles on data science
> > @article{engelhardt2011your,
> > title={This is your brain on violent video games: Neural desensitization 
> to violence predicts increased aggression following violent video game 
> exposure},
> > author={Engelhardt, Christopher R and Bartholow, Bruce D and Kerr, 
> Geoffrey T and Bushman, Brad J},
> > journal={Journal of Experimental Social Psychology},
> > volume={47},
> > number={5},
> > pages={1033--1036},
> > year={2011},
> > url={https://hal.archives-ouvertes.fr/peer-00995254/document},
> > publisher={Elsevier}
> > }
> > @article{kuhn2019does,
> > title={Does playing violent video games cause aggression? A longitudinal 
> intervention study},
> > author={K{\"u}hn, Simone and Kugler, Dimitrij Tycho and Schmalen, 
> Katharina and Weichenberger, Markus and Witt, Charlotte and Gallinat, 
> J{\"u}rgen},
> > journal={Molecular psychiatry},
> > volume={24},
> > number={8},
> > pages={1220--1234},
> > year={2019},
> > url={https://www.nature.com/articles/s41380-018-0031-7},
> > publisher={Nature Publishing Group}
> > }
> > @article{fisher1936use,
> > title={The use of multiple measurements in taxonomic problems},
> > author={Fisher, Ronald A},
> > journal={Annals of eugenics},
> > volume={7},
> > number={2},
> > pages={179--188},
> > year={1936},
> > url={
> https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1469-1809.1936.tb02137.x
> },
> > publisher={Wiley Online Library}
> > }
>

-- 
You received this message because you are subscribed to the Google Groups "pandoc-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To view this discussion on the web visit https://groups.google.com/d/msgid/pandoc-discuss/1906411e-f239-4fb3-bc83-a279b167d101n%40googlegroups.com.

[-- Attachment #1.2: Type: text/html, Size: 19975 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-16 21:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 17:01 Unable to generate citations in markdown_strict kiran kumar
     [not found] ` <f0b66f7d-b530-4c2e-9979-2fd40ff51dd4n-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org>
2021-06-16 17:19   ` Joseph Reagle
2021-06-16 19:34   ` John MacFarlane
     [not found]     ` <m21r9138os.fsf-jF64zX8BO0+FqBokazbCQ6OPv3vYUT2dxr7GGTnW70NeoWH0uzbU5w@public.gmane.org>
2021-06-16 21:53       ` kiran kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).