The one about using print/console.log/whatever lol.
I've always done this, never have I used a library for this, because:
- running manually? >myapp.log 2>&1
- using systemd? use journalctl
- using docker/kubernetes? capture automatically the stdout/stderr of your containers and pipe them through logstash or something
Real question: why would an application need to know where its logs go? This is not in the business perimeter, but the ops perimeter. Am I wrong?
EDIT:
Most responses are about configuring logging level, output format, etc... This can be done with a print-based approach without sending a request to an external service (which would need to trust you and your inputs).
I think you misunderstand the point of log4j. Logging is not free, and outputting to all three of those is quite expensive (especially in the Docker case, and it can even be dangerous as Docker - at least for a long time - did not implement proper backpressure and instead dropped logs).
log4j allows libraries to implement logging and allow the end user to worry about where the logs go, at the application level, usually via config on the command line (e.g. modifying classpaths).
Also note that log4j comes from an era before everything was "cloud-native".
In the Java world before cloud-native we had application servers like Tomcat and JBoss which typically used something like log4j (log4j-core package) to configure what gets logged and where the logs should be shipped to. These application servers where shared and usually hosted more than 1 application. Hence the need to do contextual lookups (jndi and others) to determine log routing or enrich log messages with contextual information.
Applications or libraries running inside these application servers would only implement a logging api (log4j-api package) which is just a facade so every application can write implementation-independant logging and let the application server provide the implementation and configuration.
Now in the cloud-native/12-factor world, the industry has moved away from application servers (or embeds an app-specific server in the container) so it makes sense to just log to stdout and let the container orchestration handle log shipping and filtering. The former role of log4j is now mostly handled by Fluentd, Logstash, Filebeat and friends.
Log4j still has a place though: Having the ability to use ENV vars or commandline flags to influence log levels and packages is still a nice to have. And one feature that is often overlooked is the ability to set contextual information per request. For example authentication middleware can set a userId or requestId in the request context and every log message within the request handling automatically gets logged with that metadata, avoiding the need to pass all this contextual information manually in every log.info(...) statement.
I'm not entirely sure libraries logging is a good idea anyway. I think there is a strong case that logging should be entirely an application concern not a library concern.
Have you never had to rely on logs from some JDBC library to figure out an issue in production? There are absolutely many reasons for a library to have logging, specially in the case of libraries providing tooling for anything that deals with networking (clients to any kind of services, serving requests through the network, etc.).
I think that an opaque library in terms of logging is absolutely hell to deal with, I already have my fair share of issues with libraries that don't do enough logging and require painful hours of debugging instead of reading a log...
I've got a bug... the application that I've deployed isn't working in this configuration. It's not even hitting my controller for me to be able to log to a file.
So, I kick the log level from warn to debug and I see that Spring is booting it because it failed a CORS check. This was much easier than trying to reproduce it in my local system and attach a debugger to it.
There is a lot that goes on beneath and behind the code that a person writes within the libraries. I would contend that the log4j vulnerability is because people didn't realize the extent of what was going on beneath and behind.
Making a developer blind to what goes on inside a library is not the answer. The "what goes on inside of a library" is indeed a concern for the application and the developer and should be made as easy to access as possible.
Is this partly due to the fact that everyone is depending on the libraries having logs instead of throwing the correct exception? Then the application could log it.
A program working correctly with incorrect input isn't an exceptional case.
"This endpoint cannot be accessed in a CORS improper way" is caught in the library/framework level and doesn't even touch my code - there's no exception for me to catch.
Throwing a checked exception for no access rather than returning "false" from a library call because you passed in all upper case on case sensitive role check would be a decision that I wouldn't agree with.
Another example would be "I want to see the sql generated by hibernate" - that's not exception throwing at all, its me trying to debug what the query is and why the performance is awful.
The libraries being able to log what they do helps a lot when debugging production issues. The application does not see inside cache library or hibernate or what not. They themselves being able to log what they do inside helps a lot.
Built-in logging libraries offer more flexibility: you can set the log level of each module to avoid flooding your log files with everything's debug logs, for example. Or you want to write multiple log files with different settings.
Also legacy; before systemd/docker/kubernetes, applications had to send to syslog themselves.
Some log libraries allow user configuration of logging. Some configurations can be as mundane as altering logging levels when debugging; other configurations could redirect the logs away from text files to something that preserves semantics.
But, depending on what your program is doing, printf/whatever can be "good enough."
Coming from the C# world, one of the nice things about log4net is that it has a standard exception formatter, and callbacks whenever anything logs to error. This makes it easier to log unexpected errors and phone home when they happen.
The real value is parameterized logging, sanitation of sensitive strings, logs per module, and automatic contextual value adds like function name, parameters, line number in the source code. Sysout is good for simple stuff, but a healthy set of audit logs (IE after every branch, and essentially replacing code comments with log statements) really cuts down on time needed to debug production problems.
> This is not in the business perimeter, but the ops perimeter.
That is a great point. Logging configuration should be provided at runtime, not compiled into the application. That's why most places provide the logging configuration as a runtime parameter.
Yeah as a non-Java user I have to say I'm very surprised. I wouldn't recommend just using print(), but a library needs to do very little (format like printf, filter by level (maybe depending on source file), output to text or JSON).
I don't understand how people would use a library as massive as log4j and not bother looking at what's in the box, and what they need to enable/disable. Blaming it on the opensource product seems weird when you grabbed it for free and didn't even read what's written on the box.
Not entirely wrong, but supporting log levels natively is a fairly decent “niceness” instead of parsing text to do it.
Text, though, is a pretty universal interface and I would rather wrap a binary and stream it’s output than integrate a logging library natively.
I did work with a product once that had local logs in a ring buffer and only shipped “important” logs to centralised storage, and that was nice but required context which can’t be easily gleaned from the content of a message.
Imagine a large system that has lots of log.DEBUG statements in its code. If you turn on DEBUG logging, then it logs, say, 10 TB/day. This is a meaningful cost for most organisations, also it is hard to search. Also, you mostly don't need debug level logs. One thing e.g. a logging library gives you is that you can toggle log levels very granularly, e.g. you turn on debug logging for a specific class.
And with templates, you do log.debug("{} {}", var1, var2, someThrowable) and don't even call toString on those unless you're actually logging them.
Compare this to system.err.println("DEBUG: " + var1 + " " + var2 + " " + someThrowable.stackTrace()) -- that creates a StringBuilder, allocates a potentially large String, and then sends it on.
There are numerous performance impacts (improvements) at using a logging framework that doesn't log or allocate a String to log what it doesn't need to.
There’s a whole enormous feature set that goes with Java logging that rapidly becomes a huge pain, but being able to easily adjust log levels on a per-class basis is a godsend for debugging certain issues.
Just output JSON! Most log systems will happily consume JSON, and you can easily run it through `jq -r` if you want to read it in your terminal/vi/emacs or use grep.
I spent this weekend removing Elasticsearch usage in our project. Just earlier I saw that our API wasn't responding, and was wondering how the hell someone with log4j got to us. Turns our AWS West 2 was down.
If it weren't for the fear of DDOS attacks, I'd put every webservice in a physical server in a basement somewhere and be done with it. I've been fighting this whole year to simplify and reduce the number of dependencies we use, but it's such an exhausting battle as it goes against the grain of the industry right now, and everyone almost actively works against you: build now, someone else may pay this technical debt.
I'd put every webservice in a physical server in a basement somewhere and be done with it.
One of the benefits of working in healthcare is that if it's not on-premesis, it's a non-starter for many organizations.
My company's paranoia of cloud services has paid off several times in terms of security in the last few years, even if the cost in dollars was higher. At this very moment, a vendor is going through a ransomware attack and its SaaS service unavailable. Our locally-hosted version is unaffected.
We only rent other people's computers (cough "cloud" cough) when there is absolutely no alternative, including doing without. On the other hand, it's locked me out of a number of programs and services I'd like to use.
Still, startups have to learn that if you're serious about being in enterprise, having a locally hosted option opens doors. Very lucrative doors.
I hope no startups learn that lesson, but instead hone their products with quick turnaround from users and then when they have achieved marked fit then offer to sell their product to enterprise companies for a massive fee. And by package I mean they deliver some servers with Kubernetes on it and run the same software they run in the cloud.
If a startup tries to sell to the enterprise they are likely to die before they make enough in sales and enterprise concerns (such as on prem >> price) aren't usually shared with normal businesses.
Not saying you can't make a fortune selling overpriced products to large companies (hello Oracle) but it isn't a game for a start up.
> Still, startups have to learn that if you're serious about being in enterprise, having a locally hosted option opens doors. Very lucrative doors.
Can you explain why that is?
I do agree with your point, having been exposed to running openstack on a custom infrastructure for my job has really taught me how much $ you can save if you do things yourself over say aws and that once I picked up the basics of compute, volumes, networking and security. My understanding increased.
I sometimes would find it difficult to keep up with all the aws lingo but doing everything myself has been a really good lesson in keeping things small and lean.
Usually has to do with security requirements, often dictated by the industry they're in. Health care and military contracting are both huge, and are both areas that large companies tend to end up serving even if they didn't set out to do that, with the result that lots of enterprises need to be able to satisfy a variety of security check-lists, most of which are easier to deal with if a service is self-hosted. This can include things like "must guarantee no traffic containing X goes over an insecure network" or "must guarantee no data ever transits or is stored in [LIST OF COUNTRIES]". These directives can be in conflict for different orgs, too, so it's not something you can just fix one time in your hosted version and appeal to everyone.
On a serious note, I think it's a good time to remember to donate some $ to open source, especially the Apache foundation for their incredible work over the years!
> All participants in ASF projects are volunteers and nobody (not even members or officers) is paid directly by the foundation to do their job. There are many examples of committers who are paid to work on projects, but never by the foundation itself.
You seriously think that the multi-million dollar Apache foundation, which oversaw this mess and whose "Apache Way" supposedly should have prevented it but in fact is a joke, needs more money?
There are of course many worse causes, but among umbrella organisations which just exist to provide some services to Free Software projects, Apache doesn't stand out as particularly good and I doubt that "Security vulnerabilities due to our lacklustre oversight drove a big increase in donations" is the lesson we ought to provide.
I would expect the foundation's "Apache Way" to have the effects it claims, rather than in fact being a way to dismiss concerns and pretend everything is on track when it isn't.
In particular the Apache Way includes: Responsible Oversight and The ASF Security Committee which you might think would be trying to stop stuff like this happening but really exists so that they can say they're responding to whatever new horrible problem has been found and so the system works.
What did the Responsible Oversight do with the idea of adding "lookups" to log4j which by the nature of the language and design of the API can't be safe? They accepted it and cheerfully documented this obviously bad idea. You can still go back and look at their documentation with the Wayback Machine, short of just writing "Look at this amazing remote execution security bug we added to our software" it could not be any clearer.
I disagree. If you are a software developer you should expect someone to pay you, so that you contribute to Open Source. Probably the bigger companies and indirectly the end user. It's time we break from this "free" stuff mentality. Nothing is free and even if we expect free stuff from developers, we shouldn't expect them to maintain that stuff for free at their expense.
Last time I needed an HTML/CSS framework, I was looking for the "best" open source and free one. I then decided to go with Tailwind and pay the $279 lifetime license fee. Now that I think more about, I'd have been happier with a subscription or paid updates than a "lifetime" license.
Remember that the Apache Foundation is the same foundation that insist on saying that open office is a nice piece of software and LibreOffice doesn't exist. While ignoring the fact that continuing to promote abondonware (open office) to school and other public institution is criminal.
I have this specific rule turned on in NextDNS. It’s sometimes annoying, but seems like a reasonable policy to block any newly registered domains as extra protection against phishing attempts.
Spamhaus et al. (corporate subscription), but same thing. If there's a noteworthy new domain, we'll check and whitelist it but otherwise silently disables phishing attempts. The one we have (which I don't know which specific lists) also detect new GitHub, AWS (S3), Azure (Windows Blob), and Google (Appspot et al.) subdomains.
As a former red-teamer, we would have long registered but otherwise dormant domains that did usual activities that is expected, like get TLS certs, and show some mail and whatnot...
We have a client that would like to know if we might have installed log4j by accident. It’s really hard to explain to people with limited technical knowledge what a library is, or that if Java isn’t running, then neither is log4j. Their entire stack is written in Perl and it’s the only software running, except, yup, Apache (httpd). They are very confused.
Is there any comprehensive article that covers what log4j is and just what happened that is so critical that seems to have set the entire world on fire? Disclaimer: I have never heard of or used log4j before in my life.
I don't have an article but here's a super quick rundown. Log4j is a very common logging framework used in java. It very often gets pulled in along with other dependencies, so it's easy to be using it without even realizing it. It has a feature that allows it to download and run code just by logging specially formatted strings. So if someone get cause your server to log these strings, it will run whatever code they want. On minecraft servers for example, all chat messages get logged at some point, so just posting the right string in chat will cause the server to execute the code you tell it to download.
Okay, I know I am not a Real Programmer, but even I know that user content is to be Not Trusted. Isn't it like a Security 101 principle that user content is always potentially dangerous, and to be treated accordingly?
> Meanwhile, library developers didn't realize app devs would be feeding untrusted strings to their library.
What?!
It's a logging library. I would expect users of it to be feeding user input, so that in the case of a bug, I can look at logs to see what input triggered a bug.
> It's a logging library. I would expect users of it to be feeding user input,
I agree with you, but I did see a couple of days ago someone with the diametrically opposite opinion: that we should never log user input, with a link to https://owasp.org/www-community/attacks/Log_Injection (plus this bug) as the justification.
Seems like a strange conclusion to draw. I mean, taking input from one user and presenting it to another creates the opportunity for XSS attacks, but obviously you wouldn't use that to argue that you should never show one user's input to another, because then no website could contain user-generated content. Forums would not exist and the entire web would be non-interactive.
Nah...logging user input is a must to be able to perform digital forensics and incident response. Certainly knowing exactly how an attack was triggered would help in preventing it in the future.
Just filter the CRs and LFs to prevent log forging, and make sure log files are not accessible from the web app. They should be in /var/log, not in the web root.
Your instinct is right, and in hindsight that's easy to say. Practically speaking, however, "treated accordingly" generally means "do not execute this string as a command (or pass it to anything that might do so, like a SQL statement)". It was entirely reasonable (though, we have now discovered, false) to expect that a logging framework would not execute the strings that it received.
Yes, but oftentimes you want to log user generated events (especially ones that might otherwise be ephemeral) to create, well, logs, of what has happened. You expect the log library to dump whatever string you direct it to to the correct location (based on config and log level and etc), with any necessary sanitizing, and you otherwise forget about it. You don't expect the log library to try and execute any part of the string.
As best I've been able to tell, this was not an intentional feature; it was added by the original author for configuration, so that they could drop LDAP URL's into the log4j configuration file, thus using LDAP as a "configuration server". I don't think they realized this would cross paths with every single log message as well.
Mind you, I also think that original intention was idiotic: Now your java application can't boot up unless your LDAP server is working, and for what? That's the kind of thing that makes global restarts & outage recovery a disaster.
If you look at the change that caused this it was intended that incoming log statements pattern match jndi requests and run those. The only configuration in the change was to specify the pattern match for jndi. There was no intention to run jndi from the configuration, the intention was always to read log statements and run those.
It uses the routing appender that's intended to send logs different ways via string matching and added code that said if the string matching has ${jndi... it should run that jndi code.
The change did what it said it would do. Akin to someone submitting a patch to run eval(log_statment).
That it passed review and was accepted is frightening.
I will admit that I've used it, though. I have an IRC bot written in Python with a "!calc" command that calls eval(). But I use `ast` to build the syntax tree, then walk it and compare every AST object to a white list so you can only use numbers and math operators. Anything else, such as CALLs or strings will throw an error.
> Bane vs. Pink Guy, also known as Bane vs. Filthy Frank, is an image macro series based on a screenshot from the film The Dark Knight Rises altered to include the Filthy Frank character Pink Guy. In the image, the Batman villain Bane is preparing to fight to Pink Guy.
Unless you're happy in a slightly different way than expected. Oh and we aren't going to bother documenting what we mean by happy, and it may be different for different functions, and we may change it at any time as a side effect of normal code maintenance.
There's nothing about Python that actually stops this kind of attack. It just happens to be Java this time around. But providing expansive evaluation of variables is definitely something Python should be concerned about. This would be a good time to check if poor patterns around logging and having untrusted strings in trusted contexts exists in your code base, no matter the language.
My understanding is that Log4Shell vulnerabilities come from parsing `${jndi:ldap:path}` files coming from HTTP requests, which raises the question of why wasn't that input sanitised in the first place.
Aren't `[${}]` in `$GET` and other HTTP headers normally replaced with sanitised strings to prevent these kinds of vulnerabilities?
What kind of sanitation needs to be done based on where the data is going. There is no "one size fits all" method of sanitizing because what sanitizes for one purpose will just make the data look like garbage in another.
ie, if the User-Agent is being reflected back to a user on the web page, then HTML entities such as < and > need to be changed to < and > respectively. If it's being put in a SQL query, then quotes (both single and double) and backticks needs to be filtered. Of course, really you should be using parameterized queries so sanitation isn't necessary at all.
If your data is going straight into a log, nobody would expect that data needs to be sanitized, beyond possibly filtering \r and \n to prevent log forging via CRLF injection. I would expect it to not be sanitized, since then it's not clear based on the log what a user actually sent.
Once again we didn't learn the lesson: don't trust user input. Always sanitize it and it is not sanitizing it unless you convert it to a know good condition.
https://jfrog.com/blog/log4shell-0-day-vulnerability-all-you... "If using log4j 2.10.0 or any later version, we recommend disabling message lookups globally by setting the environment variable LOG4J_FORMAT_MSG_NO_LOOKUPS to true by executing this command before Java applications are loaded in one of the system’s init scripts"
I've always done this, never have I used a library for this, because:
Real question: why would an application need to know where its logs go? This is not in the business perimeter, but the ops perimeter. Am I wrong?EDIT:
Most responses are about configuring logging level, output format, etc... This can be done with a print-based approach without sending a request to an external service (which would need to trust you and your inputs).
Example: Python's default logging module.
Still, thank you for your insightful answers :)