<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>paperless-ngx on foosel.net</title><link>https://foosel.net/tags/paperless-ngx/</link><description>Recent content in paperless-ngx on foosel.net</description><generator>Hugo</generator><language>en-us</language><copyright>Gina Häußge (foosel)</copyright><lastBuildDate>Wed, 06 Dec 2023 00:00:00 +0000</lastBuildDate><atom:link href="https://foosel.net/tags/paperless-ngx/feed.xml" rel="self" type="application/rss+xml"/><item><title>TIL: How to force paperless-ngx to consume signed PDFs</title><link>https://foosel.net/til/2023-12-06-how-to-force-paperless-to-consume-signed-pdfs/</link><pubDate>Wed, 06 Dec 2023 00:00:00 +0000</pubDate><guid>https://foosel.net/til/2023-12-06-how-to-force-paperless-to-consume-signed-pdfs/</guid><description>&lt;p&gt;I use &lt;a href="https://foosel.net/"&gt;paperless-ngx&lt;/a&gt; to manage my documents, together with some rules that automaticaly
ingest PDFs from my mail boxes. However, I noticed that a recently received invoice from AWS
was not ingested as expected. Looking at the logs I found this error message for it:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;invoice.pdf: Error occurred while consuming document invoice.pdf: DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;invalidating the signature.
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;I don&amp;rsquo;t know if a software update brought this refusal to run OCR on signed PDFs, or if AWS
simply so long didn&amp;rsquo;t send me signed PDFs, but I needed to find a way to force paperless to
ingest signed things as well as having all of that stuff stored in paperless is a vital part
of my accounting workflow.&lt;/p&gt;</description><content:encoded><![CDATA[<p>I use <a href="https://foosel.net/">paperless-ngx</a> to manage my documents, together with some rules that automaticaly
ingest PDFs from my mail boxes. However, I noticed that a recently received invoice from AWS
was not ingested as expected. Looking at the logs I found this error message for it:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>invoice.pdf: Error occurred while consuming document invoice.pdf: DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document,
</span></span><span style="display:flex;"><span>invalidating the signature.
</span></span></code></pre></div><p>I don&rsquo;t know if a software update brought this refusal to run OCR on signed PDFs, or if AWS
simply so long didn&rsquo;t send me signed PDFs, but I needed to find a way to force paperless to
ingest signed things as well as having all of that stuff stored in paperless is a vital part
of my accounting workflow.</p>
<p>A quick search for the error message brought me to
<a href="https://github.com/paperless-ngx/paperless-ngx/discussions/4047">this discussion on the paperless-ngx GitHub repository</a>
and therein I also found the <a href="https://github.com/paperless-ngx/paperless-ngx/discussions/4047#discussioncomment-7019544">solution</a>,
which is to set the <code>PAPERLESS_OCR_USER_ARGS</code> config option to
<code>{&quot;invalidate_digital_signatures&quot;: true}</code>.</p>
<p>As I run paperless via Docker I needed to add the following to the <code>environment</code> section in my paperless
<code>docker-compose.yml</code>:</p>
<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-yaml" data-lang="yaml"><span style="display:flex;"><span><span style="color:#f92672">services</span>:
</span></span><span style="display:flex;"><span>  <span style="color:#75715e"># ...</span>
</span></span><span style="display:flex;"><span>  <span style="color:#f92672">paperless</span>:
</span></span><span style="display:flex;"><span>    <span style="color:#75715e"># ...</span>
</span></span><span style="display:flex;"><span>    <span style="color:#f92672">environment</span>:
</span></span><span style="display:flex;"><span>      <span style="color:#75715e"># ...</span>
</span></span><span style="display:flex;"><span>      <span style="color:#f92672">PAPERLESS_OCR_USER_ARGS</span>: <span style="color:#e6db74">&#39;{&#34;invalidate_digital_signatures&#34;: true}&#39;</span>
</span></span><span style="display:flex;"><span>      <span style="color:#75715e"># ...</span>
</span></span></code></pre></div><p>And after adding this and a quick <code>docker compose up -d</code> things seem to now work as expected
again. Yay!</p>
]]></content:encoded></item></channel></rss>