From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.text.pandoc/31880 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Newsgroups: gmane.text.pandoc Subject: AW: I want to extract bibliographic data from Amazon pages Date: Sat, 10 Dec 2022 11:44:32 +0000 Message-ID: <03d11be1c7b64ed0b31a56f5eb209f88@unibe.ch> References: Reply-To: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29725"; mail-complaints-to="usenet@ciao.gmane.io" To: Original-X-From: pandoc-discuss+bncBCZ27W53TUFBBI7C2GOAMGQEIUC25HY-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Sat Dec 10 12:44:39 2022 Return-path: Envelope-to: gtp-pandoc-discuss@m.gmane-mx.org Original-Received: from mail-wm1-f61.google.com ([209.85.128.61]) by ciao.gmane.io with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1p3yHW-0007Y5-QQ for gtp-pandoc-discuss@m.gmane-mx.org; Sat, 10 Dec 2022 12:44:38 +0100 Original-Received: by mail-wm1-f61.google.com with SMTP id c187-20020a1c35c4000000b003cfee3c91cdsf1272004wma.6 for ; Sat, 10 Dec 2022 03:44:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1670672678; cv=pass; d=google.com; s=arc-20160816; b=WfiiMYwXwW/PnXG2z4Js3Z3Q/RB/8P4Q0UhCUVuFDJmlJBnFqaGhtahkol3K0emkte kRVFzvAePHMq/DJHPFMSD2VdiqUtjkDJRHzgLINLiux6VlJWgi6+w4bKLxqoFSkl7Aig eak/3/B6I1OHVd9aJ6EWayY1uQFMwtfp3DDugQ+024jfs6khm64Qynwac88kBm7CLnyQ QgKnQvTgwWIOeEbLzNWxBcKGngQZ0/JhYbnQdf+dBuNUVfZfmmHlJpBjJ5u4bbSFhAAe eh5hVZv/q7X9RbF5lADu3/tuQ5n4BSfuWbaljO1kHQ5cAVQSWiQ1ThQEv4zu9q59f1S0 3S5g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to:mime-version :content-transfer-encoding:content-language:accept-language :in-reply-to:references:message-id:date:thread-index:thread-topic :subject:to:from:sender:dkim-signature; bh=pkCJ/Ta9NZsFX+SF+jsJQs7IP1DgKWbuFddxJV+ULB4=; b=HczMmqUAn2yFUaAHx5qjUt1uJ5Xf9Mhijw0qF3zcw2IrKc0cH8y12FPMpCusjTv9Py 9W9KqTVsU/i65Wrip1lXvBmg2f5iQMf4x0FpSbrXk6sWo+0/Gr5j9vQeJMbbRFUoNDgc WwjU5NKeItjfDbck/uW1mVMFacc23yQ00TyfAcKrd8gILfNjbcrSDPI09lbmwntGbsub 4lD3DLj+2fsSzYKoOt8xxPtKykDSl8kO3StG2y7HZw9zCC1nOY3pHlnbPpm73+5ZCKsU F77PSjjLjiVw0D67zgSHdohpSFFDE/zqnnceRZIlM0J9t7WWvdl6/wdl2A0JfHYabCrd FIbw== ARC-Authentication-Results: i=2; gmr-mx.google.com; dkim=pass header.i=@unibe.ch header.s=mgwsel1 header.b=gRcRDoBi; spf=pass (google.com: domain of denis.maier-NSENcxR/0n0@public.gmane.org designates 130.92.0.84 as permitted sender) smtp.mailfrom=denis.maier-NSENcxR/0n0@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=unibe.ch DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlegroups.com; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :content-transfer-encoding:content-language:accept-language :in-reply-to:references:message-id:date:thread-index:thread-topic :subject:to:from:sender:from:to:cc:subject:date:message-id:reply-to; bh=pkCJ/Ta9NZsFX+SF+jsJQs7IP1DgKWbuFddxJV+ULB4=; b=MOT/AZYlR85D7M15+8NYChj971GYbbmiCpgxjtho3Zu+2wDAVUlU6U51N8FS5mWnKi VJeVuJu/vHcy88vz5ch+5NMSawFQlVKFqLgjbueXmn8DsAQMLt6ywdPYt41L1IGAuDG5 kg48Hmi45Ifl91qfRlR5yQdGxARdjALtouYVfShmbdcRJHGQeEKQAxdr9jdthM1SxeA4 NYrWTs67PDKIMxY/sAFYKpuic5cvPi7xE6+bfztLgMg+Nr2KxxtO/XronjZJ+7E6G7kt yYnuXr X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=list-unsubscribe:list-subscribe:list-archive:list-help:list-post :x-spam-checked-in-group:list-id:mailing-list:precedence:reply-to :x-original-authentication-results:x-original-sender:mime-version :content-transfer-encoding:content-language:accept-language :in-reply-to:references:message-id:date:thread-index:thread-topic :subject:to:from:x-gm-message-state:sender:from:to:cc:subject:date :message-id:reply-to; bh=pkCJ/Ta9NZsFX+SF+jsJQs7IP1DgKWbuFddxJV+ULB4=; b=6PzgWos54wvS3lxiHe8lGBSJ4BDQvbajZEv8cFC2szX46vjb5yZ4GHuB2SeecNpNKb ARAPyM+1H/55HGkeHGowt9ieW4DrX2a0+J0Es95JgtLLvoiYzigD+ynQyascCnSYq0jU pdLbBEpUMEpF8jlQnqs6KttimvwUZ0cIvsvQ6RhoYQml3srcJs+aIN4hX6sCbNgCGgEA y2G8nAPJVowucjiaxJ2ZAYzojl/0L Original-Sender: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org X-Gm-Message-State: ANoB5pn3ebpOmq+g2QhvX9YfFnb1lXOzz7DRiCSEH7h46id/bJgBvhfQ tlqBT5mM9yD4Nbnsq+st9XU= X-Google-Smtp-Source: AA0mqf7AIlJtv46dkORQrqfI3bL1hC9Eskzz7qMxvl0oU4JZpSDUpJRwo7WbPZtuWCt9DyPSI9zZWA== X-Received: by 2002:a05:600c:5010:b0:3cf:b067:416c with SMTP id n16-20020a05600c501000b003cfb067416cmr58018913wmr.134.1670672678477; Sat, 10 Dec 2022 03:44:38 -0800 (PST) X-BeenThere: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Original-Received: by 2002:a5d:5a0e:0:b0:225:6559:3374 with SMTP id bq14-20020a5d5a0e000000b0022565593374ls299334wrb.2.-pod-prod-gmail; Sat, 10 Dec 2022 03:44:34 -0800 (PST) X-Received: by 2002:a5d:4352:0:b0:236:50d4:c0a6 with SMTP id u18-20020a5d4352000000b0023650d4c0a6mr6475434wrr.47.1670672674027; Sat, 10 Dec 2022 03:44:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670672674; cv=none; d=google.com; s=arc-20160816; b=oRQRHXgpWIitX7uGlHJYPR62BdYn9ojrGTLuGtC7qISerOgb6jY/qkMme54kDujTC4 rMHBvkOzYukMcBDz7BAAYTED5aUrHEOwGHBbBAgFfgHeKR2gkE72AJYzDgRxMCjhia2a oaiSGhm+jhbswKt7C/fOxJmEA9MVHdkHijfqU8U+2FGUH5VATyXHQO7zB90Rk9pMcH05 viQEq+qtTlc3LsV40ZfcVHI3a1V8O9Qs3Po1lgW8xwUg5QLg/fSttNNfEsygkvHr3xqe aTEclgh5DJIu93+ZJ/vC5PzmcYqYtjFnnczUMjRZwUuEUlEd6eXUBL1MM7xI6YwMUQjy H7Ig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:content-transfer-encoding:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:to:from:dkim-signature; bh=wcbhe47R1jnaEfqfcK+O9980SahCt/bL7chqgnqcFuo=; b=bLpG7Mn61XyUoHkWOwet5X1nb5dOfPXE1NtleiOE7KjVL76BFSwMRWg27Qgbe6gJYR TucR2M1nE70YGAWJwSaA46M1d1Mebo0LJPiWxGQs3EhajYwXakteEYPWvrey05+cBW0j c4xvrMuRcKFcaAx4aAc8Yih136jv/RMOwBQ2X03zXiyeBgqeHAlrwYziYllJKopAvAgk 1U7wViKq3JpwGqA2WuJjkfe7UN2cN9MfKuUwdr+IjyQgVAicZv+v+kPcew7N9sfelOe2 Lz5EBrIfV+7okc17pBengitlwimw33BJrCpLFiLVOb8U1ixurclT7VI+t8WsyQh94Zj9 G/Rg== ARC-Authentication-Results: i=1; gmr-mx.google.com; dkim=pass header.i=@unibe.ch header.s=mgwsel1 header.b=gRcRDoBi; spf=pass (google.com: domain of denis.maier-NSENcxR/0n0@public.gmane.org designates 130.92.0.84 as permitted sender) smtp.mailfrom=denis.maier-NSENcxR/0n0@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=unibe.ch Original-Received: from mailhub-lb3.unibe.ch (mailhub-lb3.unibe.ch. [130.92.0.84]) by gmr-mx.google.com with ESMTPS id ck17-20020a5d5e91000000b00239778ccf84si230220wrb.2.2022.12.10.03.44.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 10 Dec 2022 03:44:33 -0800 (PST) Received-SPF: pass (google.com: domain of denis.maier-NSENcxR/0n0@public.gmane.org designates 130.92.0.84 as permitted sender) client-ip=130.92.0.84; X-Virus-Scanned: By University of Bern - MGW Original-Received: from mail.campus.unibe.ch (aai-edge-03.campus.unibe.ch [130.92.13.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mailhub-lb3.unibe.ch (Postfix) with ESMTPS id B0A3A500076 for ; Sat, 10 Dec 2022 12:44:32 +0100 (CET) Original-Received: from aai-mail-03.campus.unibe.ch (130.92.13.41) by AAI-EDGE-03.campus.unibe.ch (130.92.13.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P256) id 15.1.2507.16; Sat, 10 Dec 2022 12:44:29 +0100 Original-Received: from aai-mail-03.campus.unibe.ch (130.92.13.41) by aai-mail-03.campus.unibe.ch (130.92.13.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P256) id 15.1.2507.16; Sat, 10 Dec 2022 12:44:32 +0100 Original-Received: from aai-mail-03.campus.unibe.ch ([172.18.73.17]) by aai-mail-03.campus.unibe.ch ([172.18.73.17]) with mapi id 15.01.2507.016; Sat, 10 Dec 2022 12:44:32 +0100 Thread-Topic: I want to extract bibliographic data from Amazon pages Thread-Index: AQHZDHaWO+kXw9K2eE+dzCd9NT8+ja5m/ypX In-Reply-To: Accept-Language: de-CH, en-US Content-Language: de-CH x-originating-ip: [172.18.72.2] X-Original-Sender: denis.maier-NSENcxR/0n0@public.gmane.org X-Original-Authentication-Results: gmr-mx.google.com; dkim=pass header.i=@unibe.ch header.s=mgwsel1 header.b=gRcRDoBi; spf=pass (google.com: domain of denis.maier-NSENcxR/0n0@public.gmane.org designates 130.92.0.84 as permitted sender) smtp.mailfrom=denis.maier-NSENcxR/0n0@public.gmane.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=unibe.ch Precedence: list Mailing-list: list pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; contact pandoc-discuss+owners-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org List-ID: X-Google-Group-Id: 1007024079513 List-Post: , List-Help: , List-Archive: , List-Unsubscribe: , Xref: news.gmane.io gmane.text.pandoc:31880 Archived-At: Although it might be possible with Pandora, I doubt this is the best tool f= or the job. I'd use a language such as python (or whatever you are most comfortable wit= h) for that. You'll need to do some web scraping, read the relevant parts o= f the webpage, and output to bibtex. I bet there's a library for the last p= art. For the scraping part you can have a look at what the zotero importer does:= https://github.com/zotero/translators/blob/master/Amazon.js ________________________________________ Von: pandoc-discuss-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org im A= uftrag von Trevor Jenkins Gesendet: Samstag, 10. Dezember 2022 10:05:36 An: pandoc-discuss Betreff: I want to extract bibliographic data from Amazon pages My current workflow for getting bibliographic data from Amazon=E2=80=99s bo= ok listings is failing. I use BibDesk as my primary citation manager but it= does not extract data from Amazon listing so for that I use a lashed up sc= heme using Zotero. Zotero has a browser add-on which extracts the bibliogra= phic information from these pages. Then in Zotero I have a third-party scri= pt that sends that data to BibDesk. This has worked well for a year or more= . However there are two problems with my method. First is that the third-part= y script for extraction from Zotero does not work with the current version = of the program. I downgraded Zotero to an earlier version and that restore = my workflow. Unfortunately it now appears that changes to the browser add-o= n are not compatible with that older version and my workflow is now dammed = as it may or may not add the data to Zotero. As panda can process both HTML and BibTex formats I wonder if and how I cou= ld harness that capability to finally drop Zotero altogether as it was only= ever meant to be a stopgap anyway. A simplistic pandoc -f html -t bib text =E2=80=A6 Using the specific URL for the book I want to add does not work; I did not = expect it. Leaves me wonder whether a Lua script might be required to do th= e job. Not conversant with Lua at all so my idea is on hold. Is it possible to get pandoc to do the required extraction and if so what m= ight a Lua script look like? Regards, Trevor. <>< Re: deemed! -- You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/C57B5FA0-9810-4234-A8A8-C828D6CF27F6%40gmail.com. --=20 You received this message because you are subscribed to the Google Groups "= pandoc-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an e= mail to pandoc-discuss+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To view this discussion on the web visit https://groups.google.com/d/msgid/= pandoc-discuss/03d11be1c7b64ed0b31a56f5eb209f88%40unibe.ch.