Insecure IDE Code Completion Is a Security Defect, Even If It Is Not a CVE

Seth Larson’s PyCharm test is a useful reminder that code completion models are now close enough to production code paths that insecure suggestions need security engineering, not just product polish.

Seth Larson’s June 10, 2026 post, “Are insecure code completions a vulnerability?”, describes a small but sharp failure mode in AI-assisted development. In PyCharm, JetBrains’ Full Line Code Completion suggested two lines around urllib3 that security reviewers would normally treat as red flags: suppressing InsecureRequestWarning, then constructing a urllib3.PoolManager with cert_reqs='CERT_NONE'.

That is not a theoretical style issue. In urllib3, disabling certificate verification on HTTPS requests removes the check that the remote endpoint is who it claims to be. The library’s own TLS warning documentation says InsecureRequestWarning appears when an HTTPS request is made without certificate verification, and it frames unverified HTTPS as strongly discouraged. If an IDE suggests both “turn off the warning” and “turn off the check,” the model is not merely completing syntax. It is nudging the developer toward a vulnerability-shaped program.

What is claimed

Larson’s core claim is narrow and more useful because of it. He is not arguing that PyCharm itself lets an attacker execute code, steal credentials, or bypass isolation. He is asking whether a code generation system that repeatedly proposes insecure code should be treated as having a security vulnerability, especially when the suggestion appears in the editor with the same ergonomic path as ordinary completion: press Tab, accept the line, keep moving.

The reported versions matter. Larson says he reported the behavior against “Full Line Code Completion” v253.29346.142, then retested roughly 90 days later against v261.24374.152 and saw the same suggestions. The example starts with import urllib3, then u, where the completion offers urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning). After beginning a PoolManager, the model suggests cert_reqs='CERT_NONE'.

JetBrains describes Full Line Code Completion as a local deep learning feature that suggests entire lines of Python, JavaScript, TypeScript, or CSS code without sending code to the internet. Its public research paper, Full Line Code Completion: Bringing AI to Desktop, describes a production system for multi-token code completion in JetBrains IDEs. The earlier Context Composing for Full Line Code Completion paper calls out a Transformer model as the core of the implementation.

The product claim is productivity. JetBrains’ ICSE paper reports that online evaluation led to 1.3x more Python code in the IDE being produced by code completion. That is a real metric, but it also sharpens the security question. A tool that inserts more code into real projects has to care not only about syntactic validity and acceptance rate, but about whether accepted completions move users toward known unsafe defaults.

What is actually new

The interesting part is not that a code model can emit insecure code. That has been known for years. In Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions, researchers generated 1,689 programs across 89 security-relevant scenarios and found about 40 percent were vulnerable. That work looked at GitHub Copilot in its Codex-era form and helped establish a basic pattern: code models learn common code, and common code includes insecure cargo-cult snippets.

Larson’s example is different because of where the suggestion appears. This is not a chat response surrounded by prose. It is not an answer to “show me an insecure TLS example.” It is an inline IDE completion in the normal act of programming. The UI presents it as the next likely line, and the acceptance gesture is low-friction. That creates a different risk profile from a generated file or a copy-pasted answer from a chatbot.

This is also a useful example because the insecure pattern is compact and semantically obvious. cert_reqs='CERT_NONE' is not a subtle SQL injection sink buried under application logic. Suppressing InsecureRequestWarning is not ambiguous when it appears next to disabled certificate verification. A reasonable static rule could flag it. A security-aware post-filter could penalize it. A targeted regression test could lock it down. If a completion system cannot suppress this class of suggestion, it is hard to believe it has meaningful coverage for more contextual issues such as authorization checks, tenant isolation, path traversal, or unsafe deserialization.

That does not automatically mean “assign a CVE.” CVEs work best when the vulnerable component has a defect that can be exploited under a reasonably defined threat model. Here the immediate vulnerable artifact is the user’s future program, not the IDE binary. The IDE suggests the code, but the developer accepts it, the project ships it, and the runtime exposure lives elsewhere. That chain makes CVE assignment awkward.

A better framing is that insecure completions are security-relevant product defects with measurable risk. They deserve tracking, regression tests, and release notes when fixed. They may deserve coordinated disclosure when the suggestion is likely to produce high-impact vulnerabilities at scale. They do not always fit the conventional CVE model, but “not a CVE” should not become “not security.”

Why this happens

Full-line completion models are optimized around likelihood under context. If the training corpus contains many Stack Overflow answers, test snippets, debugging workarounds, and internal scripts that disable TLS checks to get past local certificate problems, the model can learn that urllib3.disable_warnings() and CERT_NONE often co-occur. From a next-token objective, that association is a success. From a security objective, it is a failure.

This is the old autocomplete problem with a larger blast radius. Classic IDE completion usually offered API names, methods, and symbols. It might help you discover PoolManager, but it would not usually synthesize a full insecure configuration. A neural completion model can produce the argument list, the warning suppression call, and the local convention around them. The more complete the generated line, the more policy the model smuggles into the code.

There is also a mismatch between benchmark incentives and security outcomes. Acceptance rate, edit distance, latency, and syntactic correctness are easy to measure. JetBrains’ 1.3x productivity-style metric tells us developers accept and produce more code with the system enabled. It does not tell us whether accepted code is safer, easier to review, or less likely to contain CWE-class defects. Security benchmarks are harder because the right answer often depends on intent. Disabling certificate verification can be acceptable in a tightly scoped test fixture, but it is dangerous in production HTTP clients. The model needs enough context to distinguish those cases, or the product needs guardrails when context is insufficient.

This is where simple mitigations can beat broad claims about “AI safety.” For known-dangerous completions, the IDE can apply deterministic checks. If the candidate contains CERT_NONE, verify=False, disable_warnings(InsecureRequestWarning), shell=True with interpolated input, pickle.loads on untrusted bytes, or hardcoded SECRET_KEY, the completion engine can suppress it, downgrade it, or require an explicit user action with a short warning. That is not a complete solution, but it handles the cheap wins.

The next layer is evaluation. Completion vendors should maintain security regression suites for high-signal APIs, including Python TLS clients, JavaScript HTML injection sinks, SQL query construction, cloud IAM policies, JWT validation, password hashing, randomness, and command execution. The suite should test both obvious unsafe prompts and ordinary partial-code contexts where insecure snippets are common in public code. A model that passes HumanEval but suggests disabled TLS verification in a blank application file has not earned much trust.

Practical applications and consequences

For individual developers, the lesson is concrete: treat inline AI completion as unreviewed code from a junior contributor with very fast typing. That sounds glib, but it maps to an actual workflow. You accept suggestions only when you understand the API contract, and you let linters, type checkers, dependency scanners, and security rules inspect the result. For Python projects, tools such as Bandit and Semgrep can catch many known dangerous idioms, though they will not understand every application-specific security invariant.

For teams, this should feed into code review policy. “AI-generated” is not the key label. The key label is “code not intentionally designed by the author.” If completions are enabled, reviewers should expect to see plausible but unsafe defaults. TLS verification, auth bypasses in tests that leak into application code, permissive CORS settings, weak randomness, and broad exception swallowing are all patterns that look helpful during implementation and expensive after deployment.

For IDE vendors, the practical application is product design. Inline completion is not a neutral transport. Ranking, filtering, visual treatment, shortcut defaults, and warning behavior all shape what code lands in repositories. A local model is attractive for privacy and latency, and JetBrains is right to emphasize that code does not need to leave the machine for Full Line Code Completion. But local inference does not make unsafe output harmless. Privacy is one security property, not the whole thing.

For vulnerability programs, Larson’s post exposes a process gap. If a reporter files a reproducible case where a product feature suggests a known insecure pattern, the vendor needs a category for it. Calling it a “direct security vulnerability” may be too strong. Asking the reporter not to publish under a coordinated disclosure policy while also saying it is not a direct vulnerability creates muddled incentives. The clean answer is to define an “AI security quality defect” class with severity guidance, expected response timelines, and public fix criteria.

Limitations

There are real limits to what this single example proves. We do not have a public benchmark run for PyCharm Full Line Code Completion v253.29346.142 or v261.24374.152. We do not know how often the unsafe urllib3 suggestions appear across machines, projects, seeds, local model packages, or surrounding code. We also do not know whether the model would make safer suggestions in a project with stricter linting, tests, or existing TLS helper functions.

The example also does not establish exploitability of PyCharm itself. No attacker is shown controlling the model output. No sandbox boundary is crossed. No JetBrains server is implicated, and the Full Line feature is explicitly documented as running locally. If the question is “does this deserve a CVE against PyCharm?”, the conservative answer is probably no, at least for this isolated behavior.

But the narrow CVE answer should not bury the engineering issue. The model suggested a two-step path to making HTTPS requests vulnerable to man-in-the-middle attacks, including suppressing the warning that would have told the developer something was wrong. That is exactly the kind of failure a security-aware completion product should be able to identify and prevent.

The mature position is neither panic nor dismissal. Code completion models are useful, and the benchmark evidence says they can increase the amount of code developers accept from the IDE. That makes their failure modes operationally important. Insecure completions are not always vulnerabilities in the product that emits them, but they are security defects in the development system. Vendors should measure them, publish enough methodology to be trusted, and fix the obvious cases before asking developers to treat generated code as a normal part of the toolchain.