From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31881 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Newsgroups: gmane.text.pandoc Subject: AW: I want to extract bibliographic data from Amazon pages Date: Sat, 10 Dec 2022 12:39:36 +0000 Message-ID: <0394e3cb78574a3b986a66479e6253e8@unibe.ch> References: ,<03d11be1c7b64ed0b31a56f5eb209f88@unibe.ch> Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="7318"; mail-complaints-to="usenet@ciao.gmane.io" To: Original-X-From: pandoc-discuss+bncBCZ27W53TUFBBC742GOAMGQEYPVCOFY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Dec 10 13:39:44 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wr1-f63.google.com ([209.85.221.63]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1p3z8p-0001kK-AF for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 10 Dec 2022 13:39:43 +0100 Original-Received: by mail-wr1-f63.google.com with SMTP id x1-20020adfbb41000000b002426b33b618sf1439421wrg.7 for ; Sat, 10 Dec 2022 04:39:43 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1670675983; cv=pass; d=google.com; s=arc-20160816; b=cn1A5z0RROG4DtC5HAEmrm6GDsCg5Nq71OLqNp/tNGDWM5sDIp4eegUwZAccOQpID3 4pYYa+hjzuqmoZY62lEE+Zx/Ut6tI1XXvMG8lLDXpWfU8SpsQnTNaGgzWaAzB8ktXQBn e4w2Atxsju4TJr7nZCFxYScKXADJc2Rsav0wKbfeBCo5JpeX4aZPmv70Sn+OTHfiuf7k /UAYPoCpdcyLHbNVupjMsKqSs/j+2ppvjws13sL1SeZvcFIdQ2kvaPWHq0aZtTQsWjPB jyYaXCawbzVNi8Xf/sNJNl53MRxGwsLV3hCjOH745MnEe0z756mhzPeg5iiTGcMt/HcF R0MQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version :content-transfer-encoding:content-language:accept-language :in-reply-to:references:message-id:date:thread-index:thread-topic :subject:to:from:sender:dkim-signature; bh=edKArR1dP+q8umWE31mTsCKiAcDEqCwCRwOM0RXJY+Y=; b=s68DWx2pNot1cv/quuZn81P2pmTD+sSH/FLuiGrMIvFRguMS7tcQ6eHFAne9KdxjcQ gEEnYW3AKz/z8suH41wZ5gplwvUwa2cZdfqFzDseBglHqNc16B+r0LZ4td0iUJUah+lU M+fn5O4tBKq53EyINkhLYOq51PVV8xYOBM2HwAQo9jfQDYiPLkbduf4DdJrQ9+OXkY6q 33MEdJpysWIzdZkMohOazAfoaDiHRENAz0Eo+1DGClOIigM/vrmIabslYwPfcEyVQCTJ me6znzYYE20QxeZ8y5+PlwGly6RHHGLb6ooGluscWEfRGFZ71pOfPZmrcBN2B4NCCdED jH/Q== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@unibe.ch header.s=mgwsel1 header.b=iUSvmwE7; spf=pass (google.com: domain of denis.maier-NSENcxR/0n0@public.gmane.org designates 130.92.0.84 as permitted sender) smtp.mailfrom=denis.maier-NSENcxR/0n0@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=unibe.ch DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :content-transfer-encoding:content-language:accept-language :in-reply-to:references:message-id:date:thread-index:thread-topic :subject:to:from:sender:from:to:cc:subject:date:message-id:reply-to; bh=edKArR1dP+q8umWE31mTsCKiAcDEqCwCRwOM0RXJY+Y=; b=Sl/ciVK1+ySEOpdy88lbuq5hDR+JACtuHsWTyON5UhnKWSKUG4QgFClAmrB74vIRyc rnEI6t21g34J50NRGXLBLH+RrjvMKQB7DKrn/5Ta9HTrBLLWKdwQvHE3ZRbqhxiynSjw 4cSZYq4SwfqgbrOab+WfI2VK3HfIDSEr5QKuXv/gBhVDG9IGyqjR84w6pIaDW8byZhM/ 86e7AbU3ncpMDLJOv1L4++fHRhlbH6UNvmgJQVkkWlg1tCaiUegtB5YEI6XTxqWJPX/a 9WvtrF X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :content-transfer-encoding:content-language:accept-language :in-reply-to:references:message-id:date:thread-index:thread-topic :subject:to:from:x-gm-message-state:sender:from:to:cc:subject:date :message-id:reply-to; bh=edKArR1dP+q8umWE31mTsCKiAcDEqCwCRwOM0RXJY+Y=; b=U3dzkkJrbrP2GG/oYl9sofe8wUqbdx5tc2h/ZZk/ef7Qwggr8+8G831bzLBann97Tr XmwTEJEohw+MQQzV6e+xVnWWSa1lNjPXlOiTSPw1sfnNWcgtwYaLC0Smg/7RUoFIXkFA v98jF752o7CRWKARfR1ElunkN02DA4T632hnEzkdLvIZSbytTf/wzv60cVgc+Jwtfpyh 8pwR58wQEJR9uQ5avEiH87FXL3Doq Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ANoB5pmYIx9Hlhy2qSwwGZz29U8yMNFOiYpwOFqkQ8UT4KmVEDmCJ2Qw uO3aTjx/v8bIGzwJah4n9sY= X-Google-Smtp-Source: AA0mqf5yPOqmBESrGcgVaB/2eovkeIIrvUUCoMks6+Rr/w+nnj46kKgZql3wuseAf0WwjLxEl0SU6Q== X-Received: by 2002:a05:600c:384d:b0:3cf:7217:d5fa with SMTP id s13-20020a05600c384d00b003cf7217d5famr59125091wmr.191.1670675982834; Sat, 10 Dec 2022 04:39:42 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a05:600c:3b17:b0:3d1:be63:3b63 with SMTP id m23-20020a05600c3b1700b003d1be633b63ls7700594wms.1.-pod-canary-gmail; Sat, 10 Dec 2022 04:39:38 -0800 (PST) X-Received: by 2002:a1c:6a14:0:b0:3c6:e63e:24d with SMTP id f20-20020a1c6a14000000b003c6e63e024dmr7616278wmc.36.1670675978127; Sat, 10 Dec 2022 04:39:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670675978; cv=none; d=google.com; s=arc-20160816; b=h0k1Sbrc1Gc43wYbdmZuylVfsA1bc0OJauIDt4qUqcxvNTgi6W6CzudMx5zy6ZlT81 EQBhv+h0VkEbxVSoQOOmjtzhGdeOz1U2SS0Knf44Wd22CDBrleboXTdueDokwOvyS+QK m2IkejjxYvvG/hSmR/3Wyqrz7yxXzBIT4vtPOJN0W40vudN9WDFalN9d9Cm+1HC0RIt6 6t9uckwKcztZD4pUaS5LvJOnukJxBxZOlSaSmNhgx3gizXSFoY9Ti9w7/7CsHwbISHiA UEYC31MabzyrBwsP3Uz1dpKP8NIPnbjiWJQ2IVDbxrUUt/OWPMG8zrilwbgRek/s08jD 2iZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:content-transfer-encoding:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:to:from:dkim-signature; bh=3faH6lqO/pzSQNNa66CTmLk4wIWQZq15aGrC5sCT2uE=; b=soJt0LQISiUm6wKfjsu17Bu7DNfU2izETcHcwraLCS1xSAPNL/Oaz0pZ8WuM2BY4Ha OzWymcl2zLT6ADYyD/nPVCc4ZwfQX5CC6TErhqwNIsT6kZZd22wiFG8SFE2Bb/m9FIXl +jVRNqS1uBgnLelbPYltYENaXx9q5LztUe9CTVZtPMel2/oQdLOUZEIOS/kaUDqmaYyL MoAXoc5byB0WV7TclGt58Y6oVmIz1MWo7jlg+BYl/mR+dOW64BRljffSnGnOpzOO8RA3 rBc+IsILkmFITvXrdzhKyBA4Z9Anpfdv2Y+do6sihe5hHCWFsl/H6JVmgYA4iEVACFyZ HoOA== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@unibe.ch header.s=mgwsel1 header.b=iUSvmwE7; spf=pass (google.com: domain of denis.maier-NSENcxR/0n0@public.gmane.org designates 130.92.0.84 as permitted sender) smtp.mailfrom=denis.maier-NSENcxR/0n0@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=unibe.ch Original-Received: from mailhub-lb3.unibe.ch (mailhub-lb3.unibe.ch. [130.92.0.84]) by gmr-mx.google.com with ESMTPS id m187-20020a1ca3c4000000b003d1d8d3577dsi691629wme.2.2022.12.10.04.39.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 10 Dec 2022 04:39:38 -0800 (PST) Received-SPF: pass (google.com: domain of denis.maier-NSENcxR/0n0@public.gmane.org designates 130.92.0.84 as permitted sender) client-ip=130.92.0.84; X-Virus-Scanned: By University of Bern - MGW Original-Received: from mail.campus.unibe.ch (aai-edge-04.campus.unibe.ch [130.92.13.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mailhub-lb3.unibe.ch (Postfix) with ESMTPS id DE73C500076 for ; Sat, 10 Dec 2022 13:39:36 +0100 (CET) Original-Received: from aai-mail-03.campus.unibe.ch (130.92.13.41) by AAI-EDGE-04.campus.unibe.ch (130.92.13.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P256) id 15.1.2507.16; Sat, 10 Dec 2022 13:39:33 +0100 Original-Received: from aai-mail-03.campus.unibe.ch (130.92.13.41) by aai-mail-03.campus.unibe.ch (130.92.13.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P256) id 15.1.2507.16; Sat, 10 Dec 2022 13:39:36 +0100 Original-Received: from aai-mail-03.campus.unibe.ch ([172.18.73.17]) by aai-mail-03.campus.unibe.ch ([172.18.73.17]) with mapi id 15.01.2507.016; Sat, 10 Dec 2022 13:39:36 +0100 Thread-Topic: I want to extract bibliographic data from Amazon pages Thread-Index: AQHZDHaWO+kXw9K2eE+dzCd9NT8+ja5m/ypXgAAPpGE= In-Reply-To: <03d11be1c7b64ed0b31a56f5eb209f88-NSENcxR/0n0@public.gmane.org> Accept-Language: de-CH, en-US Content-Language: de-CH x-originating-ip: [172.18.72.2] X-Original-Sender: denis.maier-NSENcxR/0n0@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@unibe.ch header.s=mgwsel1 header.b=iUSvmwE7; spf=pass (google.com: domain of denis.maier-NSENcxR/0n0@public.gmane.org designates 130.92.0.84 as permitted sender) smtp.mailfrom=denis.maier-NSENcxR/0n0@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=unibe.ch Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31881 Archived-At: Oh, and I'm almost certain there must be an existing command line tools tha= t lets you retrieve bibliographic data if you provide an ISBN or doi. The d= ate might not come from Amazon, but their data isn't the best anyway. If you use emacs there's org-ref which contains https://github.com/jkitchin= /org-ref/blob/master/org-ref-isbn.el ________________________________________ Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org im A= uftrag von denis.maier-NSENcxR/0n0@public.gmane.org Gesendet: Samstag, 10. Dezember 2022 12:44:32 An: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Betreff: AW: I want to extract bibliographic data from Amazon pages Although it might be possible with Pandora, I doubt this is the best tool f= or the job. I'd use a language such as python (or whatever you are most comfortable wit= h) for that. You'll need to do some web scraping, read the relevant parts o= f the webpage, and output to bibtex. I bet there's a library for the last p= art. For the scraping part you can have a look at what the zotero importer does:= https://github.com/zotero/translators/blob/master/Amazon.js ________________________________________ Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org im A= uftrag von Trevor Jenkins Gesendet: Samstag, 10. Dezember 2022 10:05:36 An: pandoc-discuss Betreff: I want to extract bibliographic data from Amazon pages My current workflow for getting bibliographic data from Amazon=E2=80=99s bo= ok listings is failing. I use BibDesk as my primary citation manager but it= does not extract data from Amazon listing so for that I use a lashed up sc= heme using Zotero. Zotero has a browser add-on which extracts the bibliogra= phic information from these pages. Then in Zotero I have a third-party scri= pt that sends that data to BibDesk. This has worked well for a year or more= . However there are two problems with my method. First is that the third-part= y script for extraction from Zotero does not work with the current version = of the program. I downgraded Zotero to an earlier version and that restore = my workflow. Unfortunately it now appears that changes to the browser add-o= n are not compatible with that older version and my workflow is now dammed = as it may or may not add the data to Zotero. As panda can process both HTML and BibTex formats I wonder if and how I cou= ld harness that capability to finally drop Zotero altogether as it was only= ever meant to be a stopgap anyway. A simplistic pandoc -f html -t bib text =E2=80=A6 Using the specific URL for the book I want to add does not work; I did not = expect it. Leaves me wonder whether a Lua script might be required to do th= e job. Not conversant with Lua at all so my idea is on hold. Is it possible to get pandoc to do the required extraction and if so what m= ight a Lua script look like? Regards, Trevor. <>< Re: deemed! -- You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/C57B5FA0-9810-4234-A8A8-C828D6CF27F6%40gmail.com. -- You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/03d11be1c7b64ed0b31a56f5eb209f88%40unibe.ch. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/0394e3cb78574a3b986a66479e6253e8%40unibe.ch.