Hacker News new | past | comments | ask | show | jobs | submit login
EC2 Serial Console (amazon.com)
183 points by TangerineDream 9 months ago | hide | past | favorite | 117 comments




Wow. Out of curiosity just checked if the other cloud providers have it - both Azure and GCP have this feature already. Azure got it ~ Feb 2020, GCP Feb 20201.


Disclosure: I used to work on GCE (and was adjacent to the serial port work).

IIRC, we launched interactive serial port access sometime in late 2014. For example, mbrukman answered a SO question on Jan 2, 2015 with connect-to-serial-port [1]. I don’t recall when we gained fancier IAM controls for it, but we’ve had it forever (and I think getting / view only was there at public launch).

[1] https://stackoverflow.com/questions/27734763/how-do-you-acce...


I did the security review of the GCE serial port back when it first came into existence. We probably know each other. The tech industry really is tiny.


Yeah I just checked and you’re both listed as authors on the design doc (as am I, amusingly, though I really don’t remember doing anything particularly useful or significant for it).


You did the original demo/work! I’m really just taking credit for it :).

Edit: and really the person who did the most work isn’t mentioned here (that’d be up to them)


I will almost guarantee you two know each other (as someone who knows both of you from Google :)


Pictured: three Google engineers happily doxxing each other (c. 2021, colorized)


I feel like our usernames covered that...


Can we turn it into an NFT and get more Cereal boxes?


"Google-O's: Now With Even More O's!"


lol yes


Same feeling after the cloudshell release around 4 year waiting for it


Oh thank god. I had a customer once that erased their SSH keys, and had a running database cluster on EC2 that they couldn't get access too anymore. That was... fun.

This is a long time coming.


I've used SSM Agent to get out similar hot spots in the past, but this will be a nice option for those instances that are somehow both broken to the point where you can't connect to them and still important enough that they need to be up _now_.


Sure there’s down time involved, but you can always stop and mount the volumes elsewhere and then update the SSH keys.


They could not accept downtime. They were also using a database that had it's consistency guarantees scaled back for "web scale" reasons and were terrified of what a hard reset would do. Yes, the VM could just disappear randomly anyway, which made doing that a very poor choice.

There were a whole lot of questionable prior decisions that had been made that did not help.

You learn a lot in situations like this about being helpful with the customer without being judgey. Get them back on their feet with a smile when they thought they were screwed keeps a lot of contracts around.


You learn a lot in situations like this about being helpful with the customer without being judgey. Get them back on their feet with a smile when they thought they were screwed keeps a lot of contracts around.

I like this paragraph. It says a lot for what divides the long term contractors with a full pipeline and those who don't. It would be a neat topic to blog on if you ever have the time.


To use something like this you’d still need an SSH root password setup in advance. So not much different than not losing your SSH keys or creating fallback ones.

One other option is to exploit a bug in managed software to escape to a shell. One man’s CVE or backdoor, is another support engineers magic sword to save the day.


My understanding from the post was that it requires a system root account, not ssh. The former implies the latter, but you could (and usually should) disable root ssh access.


Is that strictly true?

Other implementations I've seen drop you right into a root shell, relying on equivalents to IAM to govern access to the other side of the virtual serial port rather than machine local permissions.


Depends on how getty (or other local TTY manager) is setup. Most (if not all) Linux and BSD distributions attach login(1) to console and serial TTYs by default, and login(1) will normally require password authentication. Though, whether login prompts for a password is (I think) also a function of flags in /etc/passwd (or /etc/shadow or equivalent), so you may be able to login as `root` or `$USER` without entering a password if the account was setup that way.


if you have serial console access, can't you key-combo it to boot into single user mode?


That would still entail downtime, so if the goal is to gain access without downtime, single-user mode is probably a non-option.

(Though if some downtime is allowed, then it's probably possible to get into single-user mode, then manually start the relevant daemon and its dependencies, basically doing whatever init would normally do; I've done this in my homelab on Slackware, but it ain't something I'd be excited to do in production, and systemd probably complicates things further)

Also, not sure about other distros, but I recall that Ubuntu normally requires a root password even for single user mode. You might be better off using a boot disk and chrooting your way in.


Not without rebooting, no. None of the sysrq keys provide access to the system. Even if they did, switching into single-user mode would result in downtime for the running service.


Not by default as far as I'm aware. Though, it's been a long time since I've used a desktop Linux distribution. My experience is mostly with serial access for servers--Linux, BSD, and Solaris. I much prefer serial ports for backup administrative access as there's little chance of a misconfiguration. Whereas w/ port failover and other IPMI, BMC, etc non-sense if you disconnect the ethernet cable from the dedicated network management port you may still have admin/admin access live on the network.


once upon a time in another life it seems like, we had a modem hooked to a serial multiplexer in the "core room" in the library, and I swear once I used that to reboot a solaris machine to single user mode, but I probably telnet'ed in (this was before ssh) and then did the init 1 or whatever and then dialed in to the serial multiplexer.


I once had a second-hand Sun Sparc system, and several Sun amd64 systems (discounted for startups), and I remember Sun having a fairly sophisticated serial-attached firmware console. Though a fail-open default configuration that permitted trivially bypassing authentication seems more like something one would see w/ an x86 BIOS or bootloader. I wouldn't claim the serial-attached bootloaders on my x86 lab machines don't somehow permit bypassing login(1) authentication. But at least w/ EC2 and other hypervisors that element should be out of the equation.


This is what I do at home (kvm serial console on sulogin with root's password locked), but I would never consider that for a multi-tenant cloud provider.

A simple "rely on IAM for access control" doesn't match with my idea of defense in depth.


The article says "the only requirement is that the root account has been assigned a password, as this is the one you will use to log in", but obviously that will only apply to the default Amazon Linux AMI configuration


Probably shouldn't have erased their SSH keys then.


I mean, giving customers the finger is one strategy. Bending over backwards for the ones that pay out the ass for a good support contract is another.

"Your business might not exist if our engineers didn't dedicate themselves to your problem even though we didn't need to according to our SLA" goes a real long way at reup negotiation time with the right sized business (not too big, not too small).


So, a lot of database systems aren't using EBS for many reasons: that's why the local I/O instances exist. And so, yes: you don't build out like that unless but you are prepared to just scrap the instance and rebuild it from WAL logs or have replication or whatever, but knowing "this is a 30 second fix with a serial console" really makes taking all of that cost in stride painful.


That's what I was thinking... I mean, this is cool and all, but the only reason I'd ever muck around at this level is if I was actually responsible for the physical hardware and there was a cost associated with replacing it. The whole point of moving to the cloud is to pay somebody else to have to worry about anything I can comprehend needing this for.


The FreeBSD kernel debugger is available via serial console and is really useful when it's useful. If you manage to have a system that breaks your kernel, chances are switching instances won't help, and figuring it out will help. If it crashes, you can get a core dump, but if it loops in a bad place and that only breaks some things, the debugger can show things quickly.

Also serial console is pretty handy when you push bad firewall rules and don't want to throw away your instance.


A lot of medium sized companies moved to the cloud without understanding the tradeoffs. And they have big pockets when it matters.


I suspect the percentage of “cloud servers” that are just standard Linux boxes running relatively standard applications is quite high.


This seems to be such an obvious feature that I'm surprised they are only adding this now.

I'm not an expert in hypervisors or anything like that and so I'm wondering what was stopping them from adding it in the past?


Likely they just had no need for it themselves. Might be that they didn't prioritize having a feature over the risk of someone taking over their hypervisors thanks to a buggy serial port emulator.

Pretty much all hypervisors support serial consoles, but usually those interfaces are limited to trusted admins. For something like AWS, they'll also have to connect it from the hypervisor hosts into their public UI, and they can't trust the users.


They probably don't need it because they are much more likely to follow best practise "treat servers as cattle, not pets".

If an instance wedges itself onto a state where I need console access, I'd just kill it and provision a replacement (ideally, my monitoring and automation will have done that already and not even have woken me up to tell me).

I'm not sure I'd be at all comfortable having irreplaceable single points of failure in AWS. (Though I do recognise that people use it that way all the time...)


"cattle vs. pets" misses some important nuance though, especially the way it's usually used to emphasize how you need to architect systems in a cloud environment that way because you have no way of fixing some issues.

What it is actually saying is not that it's a good approach to just throw away servers when they start having problems. What's important is having that ability when it is necessary.

Servers don't just randomly fail. If your instance goes down, sure, your automation will recover your system, but it will still be important to know why it failed, because it may be a symptom of a deeper issue.

It can be due to a hardware issue, but even when that happens, you don't throw away good hardware; you fix it and the server can return to full operation. In the cloud, though, you have no idea what the hardware is doing. Maybe the instance failed because the underlying host failed; maybe it didn't. You should still find out.


Sure, I totally agree about nuance and the "needing to architect it that way" meaning there. But once you have it architected that way, you then gain the ability to mostly ignore single failures, and only look for "deeper issues" if failures persist.

Even at not-very-high scale, AWS instances _do_ "just randomly fail", at least for all practical interpretations. I don't run anything like FAANG scale, only hundreds of instances rather than thousands or millions, and I see at least a few "random failures" a year (not including spot instances terminating, which I see in clumps every month or so).

I (almost) never try to repair broken a EC2 instance. Wherever I can, they'll be running totally stateless, and I just provision new ones and kill off old ones. I probably won't even bother investigating if it's a rare and singular problem on a known-reliable platform. If one instance wedges and gets replaced, I'll just have a note to investigate if it happens again any time soon. If we get a second failure, we'll go looking in logs and maybe keep and investigate the EBS volume.

For platforms running new-ish code, procedures are different. If we see dead instances after deployments we obviously investigate the new code/config there. But a fair chunk of clients where I am only get 6 or 12 (or even 24) month backend update cycles, if I've got dozens of instances running the same code for months on end and _one_ dies, we just bury it and replace it, and keep a closer eye on the rest of the "herd" for a week or two.


AWS instances usually "randomly fail" because the underlying hardware has issues. I still don't think it's truly random, but the problem is that you don't really get access to any direct indicators that the host is about to fail before it does. You don't even truly know how old the hardware is, so your risk mitigation strategy has to assume that anything can fail at any time with zero indication of issues beforehand.

When you manage your own physical servers, you have more knowledge of your risk. The actual time of failure will still be random, but if you've been running a host for 5 years straight, you know the risk is growing.

But we're mostly agreeing here. In the scenario where you throw away a "randomly failed" instance, the historical stability is good evidence that it is due to a hardware failure, and you can just replace the instance and move on.


I'd use it to troubleshoot quirky AMIs which simply do not boot in a specific setting no matter how many times I try.

Anything more esoteric than "a normal Ubuntu" can have a bug, e.g. it hangs with three network interfaces or similar.


How do you debug broken instances?


Typically the goal is to architect the system such that you don't really care. If it's stateful, there's some other replica. Promote that and then spin up a new replica from a backup and roll it forward. If it's stateless then just kill it and spin up another.


But what if there is a bug that’s repeatable / pops up on the regular?


Same as a herd of cattle, if a bunch of them get sick in similar ways you change your process to one where you can find out why. But just one? Shoot it and bury it. Then keep an eye out on the rest in case it's a developing pattern.


Shut them down and mount the volume on a new host.


Lower end VPS providers have offered full console access for years. For example:

https://www.linode.com/docs/guides/using-the-linode-shell-li...


The biggest reason is probably security. This is not something you want to take chances with.


That's my take too.

They've gotten so far without this functionality that I have to wonder what finally tipped the balance into their offering it.


My guess is that an important vendor is shipping AMIs without sshd, and they need an "emergency back door".


<cynical hat>

s/vendor is shipping AMIs without sshd, and they/TLA/


I get this sense they are very conservative in their feature set there? Nested VMs for example are supposed in GCP, Azure and Oracle clouds I think, but not AWS. VM migration too I think exists in GCP at least, but not AWS. It's interesting.


Disclosure: I work for AWS building cloud infrastructure.

I wouldn't assume that VM migration does not exist in AWS. The overall design and implementation of Google's infrastructure somewhat mandated the development live migration support from day one. AWS was designed and built differently, and some types of events that force live migration in GCE do not exist in AWS.

One specific example from Google's VM Live Migration At Scale paper [1] is "Regular maintenance on the power infrastructure in our data centers requires powering down subsets of machines for extended periods of time". The power infrastructure at AWS is designed to be redundant and concurrently maintainable, which removes a significant need for workload mobility within the datacenter.

Personally, I think it was a very good idea to turn the thing that had to be built to launch into a marketed differentiated feature. But that doesn't mean that AWS doesn't have an ability to live migrate some workloads if it is able to do so without disrupting customers, or if it delivers a better experience than alternatives (e.g., instance degrade notices).

[1] https://dl.acm.org/doi/10.1145/3186411.3186415


Who needs nested virtualization when you have bare metal?

VM Migration is only for maintenance on GCP -- and customers can't control it, just Google.

AWS can hot patch live systems in place without any downtime, so, that's better than a migration (which has a brown out / maintenance period)


> Who needs nested virtualization when you have bare metal?

AWS's non-bare-metal systems can boot in ~10s with enough tuning.

Their bare-metal systems take tens of minutes to boot.

Nested virtualization would allow scaling up and starting new nodes much faster.


If you have a use case for spinning up nodes fast -- what is the use for nested virtualization? Sounds like that should be containers on any underlying single-virtualized layer?


Sometimes you want the safer isolation of virtualization, and you want to spin up individual workload elements in virtual machines.


Disclosure: I worked on nested virtualization for GCE.

Even when you offer bare metal, it’s actually still nice to have nested virt! Otherwise, every node has to be a full sized host. So when you have a K8s cluster or similar with a pile of nodes and want to allow some teams to use it (e.g., Android emulator, firecracker, whatever), it’s really nice not to have to say “okay, this group requires full bare metal hosts that they manage themselves”.

tl;dr: nested virt is still a nice to have so that all your infrastructure looks the same.

Edit: Also, you can trigger migration yourself if you want (gcloud compute instances simulate-maintenance-event), but that's mostly to convince yourself that nothing bad will happen.


AWS is super slow to deprecate or remove features and services.

A few people were noting that SimpleDB has been deprecated, it's not listed in the AWS web console, you can't find docs for it anymore, but if you have a running instance, your service API calls still work. And I think there have been many deprecation warnings since, plus migration messages. But they don't want to break existing clients.

I'm guessing this is a similar case where they want to be really, really sure that it's worth offering the service.


> ...you can't find docs for it anymore

Here are the docs: https://aws.amazon.com/simpledb/


Ah, my bad. But the service is no longer listed in the AWS Console menus.


Because it never was, as it predated the AWS Management Console.


S3 also predates the console, what kind of argument is that?


"no longer" is weird phrasing if it never was there.


You know those moments in sitcoms when some people discuss about something important and someone intervenes with some boring minutiae?


Perhaps patents in HP's iLO system?


I suppose this isn't much of a surprise, but it's kinda sorta pointless for Windows instances. I was hoping it would be something, but it's just a dump of the standard Windows system log output.

And adding to the fun, watching it on an initial instance bootup seems to block the process AWS uses to grab the encrypted password out of the log. So, it's not useful, and makes the instance a bit hard to remote into.


You can enable the SAC for Windows. Then, you can have a command prompt too over serial.


Fair enough, found the documentation:

https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/troub...

It does require connecting to the working instance first and setting things up, which is less than ideal, but I get why it's the case. I guess I'm off to configure the Windows instances I care about, so I have a way to troubleshoot things in the future.


This is super useful if you are in the game of building images, or highly tweeking init systems and/or the kernel.

For general consumers, not much value IMHO.


I see a ton of value when you have instances on a private network with no shell access and want to debug them without setting up a bastion instance.


AWS's (poorly named) SSM Session Manager service already allows that.

https://docs.aws.amazon.com/systems-manager/latest/userguide...


It requires an agent.


Yep, this might very well enable whole deployments where there is no SSH access anywhere.


I manage the AWS infrastructure for a medium sized SaaS company and we have no SSH access to any of our ~20 servers. The only access is through AWS Systems Manager Session Manager. Not managing keys is major upside.


Yep, I'm here for the init system debugging too.


I would have loved to have that when working on building AMIs for custom Linux systems, e.g. https://github.com/cogini/buildroot_ec2 and https://github.com/cogini/nerves_system_ec2

I spent a lot of time looking at console screenshots of machines that would not boot and iterating to figure out the problem.


Not gonna lie... if I ever got into a situation where I needed serial access to an EC2 instance, I'd just retire the EC2 instance and spin up another one.


Disclosure: I used to work on GCE (and even helped push our serial console access years ago).

That’s a good default posture. What sucks is when you’re trying to debug a system that has OOM-killed sshd and then is behaving generally poorly. If you replace your instance with another one, you just get another OOM kill.

At this point, without interactive serial port access, you get to replace whatever you’ve got on the box with more logging statements. That’s a totally reasonable approach, but with interactive serial ports you can poke at it and root cause a lot faster.

Edit: Also, Linux seems to always kill sshd first. (Part of this is survivorship bias, of course).


I did a lot of kernel development with GCP instances. Having the serial port enabled me to use a remote debugger and made it super great.


I don't understand your last statement. Linux wouldn't kill sshd first unless it somehow was the highest memory consumer. [0]

[0] https://unix.stackexchange.com/a/153586


The OOM score isn’t strictly ordered by memory usage. The oom score adjustment is usually the cause.

Amusingly, this finally forced me to find bugs like this one:

https://bugzilla.redhat.com/show_bug.cgi?id=1071290

(All processes started under a remote shell get adjustment -1000, which is basically never shoot me).

There are a few related to setting up the sshd adjustment itself as well.

So, looks like a config problem! Thanks for pointing this out.


You're right that it's not _strictly_ memory consumption and that other criteria and overrides exist, but memory consumption is highly weighted.

Regarding SSH, if you enable sshd debug logging you can see that sshd sets its own score to the minimum possible [0] which is why your comment about sshd being targeted still doesn't make sense to me. I actually didn't know it was sshd doing this on its own till I ran this:

  server ~ # grep oom_score_adj /usr/sbin/sshd
  grep: /usr/sbin/sshd: binary file matches
...which is fascinating and clever. That's when I checked the source code linked at "[0]". I now finally have an answer as to why I've seen dmesg memory stat dumps display different oom_score_adj values for sshd. I always thought _something_ was smart enough to know that we don't want to risk killing sshd, but I didn't know what that _something_ was. It turns out it was the daemon itself.

  Oct 13 23:20:38 server kernel: Mem-Info:
  ...
  Oct 13 23:20:38 server kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
  Oct 13 23:20:38 server kernel: [ 2455]    60  2455  1778653   273114     710      10        0             0 mysqld
  ...
  Oct 13 23:20:38 server kernel: [12085]   207 12085    22574      673      32       3        0             0 tlsmgr
  Oct 13 23:20:38 server kernel: [ 4238]     0  4238     9234      518      20       3        0         -1000 systemd-udevd
  Oct 13 23:20:38 server kernel: [12278]     0 12278    88107     5597     136       4        0             0 apache2
  Oct 13 23:20:38 server kernel: [17222]     0 17222  1258983   142035     505       8        0             0 qemu-system-x86
  ...
  Oct 13 23:20:38 server kernel: [21069]     0 21069     5033      487      14       4        0             0 bash
  Oct 13 23:20:38 server kernel: [15935]     0 15935     7081      487      16       3        0         -1000 sshd
  ...
In retrospect it makes a lot of sense, especially considering sshd runs as root -- it has complete ability to do that. And it's not like anything else would know the importance of sshd except for itself.

However I still don't understand your comment about the Linux OOM killer wanting to kill sshd "first" (or _ever_ based on these renewed findings!) Can you elaborate?

[0] https://github.com/openssh/openssh-portable/blob/e51dc7fab61...


If you have a large amount of memory, are doing decent amounts of IO, and linux OOMs, the system becomes unresponsive for many minutes before killing any process. At which point, ssh sessions timeout endlessly. A serial console stands a chance.

There is then also any case where you're debugging AMI builds and need to fix grub, or the init system without waiting 20 minutes for a new AMI build each time.

Also, the existing console log feature in AWS is insultingly not real time. It doesn't typically update at all unless you're within minutes of boot or trigger a reboot and it only buffers something like 4kb so a reboot can easily fully replace the logs. This really sucks when you're trying to get the debug console output, so this feature finally solves that.


Why would linux trigger OOM if you have a large amount of memory available? Or, what did you mean by "large amount of memory"?

Also, why would an SSH session, which is entirely in memory, time out because of I/O thrashing? You can disconnect the hard drive that sshd and/or the OS is running from and your SSH connections to that machine won't break. If you run some commands that aren't cached in memory you'll naturally get critical I/O errors, but it won't cause a disconnect on the SSH layer.


By large amount of memory, I meant systems that have large amounts of memory that is nearly full. For example, a server with 256GB of memory and 255.9GB in use.

SSH is purely in memory, however, in order to allocate memory for it, linux will pull "free" memory out of whatever heavily fragmented corners it can find them in. And, it may even need to perform disk I/O to free memory that was tied up in various disk caches.

People refer to this as a "livelock", where Linux is going crazy doing lots of stuff but from userspace the system is completely frozen.

Facebook developed OOMD, a userspace oom killer to deal with this issue, their release blog post references the 30 minute livelocks they face: https://engineering.fb.com/2018/07/19/production-engineering...

They actually have gone so far as to submit kernel patches for newer PSI (pressure stall information) interfaces which they use in oomd to better detect stalls due to this thrashing


> However I still don't understand your comment about the Linux OOM killer wanting to kill sshd "first" (or _ever_ based on these renewed findings!) Can you elaborate?

I suspect running low on memory can trigger symptoms that look like sshd failing.

sshd gets paged out (or something else you need for a successful login). Un-paging becomes incredibly slow, as there's lots of IO going on from all the paging. Anything garbage-collected starts running GC constantly, using 100% CPU.

Then your attempt to SSH times out - and with no access to list running processes, one naturally concludes sshd has failed.


Yeah, maybe in addition to my survivorship bias from RHEL6-era images, I am mentally conflating samples from “we OOM killed sshd” and “we are swapping violently; we won’t get in via sshd”.

In fact, I would guess (especially given all this investigation!) that it’s much more likely that an inaccessible box is just under too much memory pressure for sshd to respond.

Amusingly, the answer is still the same: serial port! :).

Thanks again for all the pointers (to everyone in this thread).


That code is 12 years old: https://github.com/openssh/openssh-portable/commit/c8802aac2...

So maybe the parent post was remembering a time before then.


Most of the time, that is what I'd suggest to do. However this is like the 4 wheel drive on my truck. I usually don't need it, but when I need it, I really in a bind and glad to have it.


It makes debugging the construction of instances way more annoying, particular if you are dealing with any local networking on the machine. The turnaround time for testing something goes from tens seconds to multiple minutes (at best) without a serial console.


If something like this happens to an instance, I don't trust the VM's state even after recovery and I'd retire & replace regardless. But I'd love to have serial access to do a root-cause analysis to prevent the issue from occurring in the future.


that’s the right answer. unless you are doing something super-specialized or esoteric you should not touch this. and as a matter of fact if you are doing things properly you also don’t have ssh access. remember? cattle not pets!


Was this really not available before? It is such a basic requirement for virtual machines. Other providers like Linode have had this since 2009 if not earlier.


I find it interesting that half of the comments indicate that people don't really understand why this would be needed and the other half are surprised that is hasn't been offered before.

Personally, I'm in the first camp. I'm used to taking instances for granted on the rare occasion that a low-level issue arises and just promoting a replica or trashing it if it's stateless.

I'm assuming the use case where you care about fixing the type of issues this feature helps debug is fairly esoteric?


Proxmox has had this for quite a while.

https://pve.proxmox.com/wiki/Serial_Terminal


Is this the cloud equivalent to hooking up a monitor and keyboard to a server?

I remember having an EC2 terminal in the browser years ago and recently I went back and it seemed far more locked down.


Sort of, a few key differences:

- usually these are running with very few dependencies in the userspace stack, such as a getty directly spawned by init (or the modern equivalent arrangement). This means that it's accessible even if your networking stack is not working, or if you screw up your firewall config. You can even make sure your init / getty / bash are statically linked so that not even ld.so breakage will stop you.

- it looks like they're also enabling Linux's Magic SysRq features, which gives you some very raw hooks into the kernel itself.


I wonder if this is implemented at the hypervisor level (gives you access to the "console") or if this is actually implemented as a serial port.


What would be the difference exactly? The console of a Linux VM is either the emulated serial port or the emulated VGA device + emulated keyboard.

I once implemented a tool called virt-dmesg which read out the log_buf from a running Linux kernel (surprisingly useful for those tricky crashes, but difficult from a maintenance point view so the tool is now abandoned). I suppose that's the closest you could get to a "real" console at the hypervisor level.


A "text console" is almost always there but a serial console has to be enabled on the instance itself.

Based on a quick SSH it looks like it's a serial thing:

    root        1221 /sbin/agetty -o -p -- \u --keep-baud 115200,38400,9600 ttyS0 vt220
    root        1229 /sbin/agetty -o -p -- \u --noclear tty1 linux


... or the parallel port. A lot of people forget that one.

* http://jdebp.uk./Softwares/nosh/guide/commands/linux-console...


Am I missing something? This is for Nitro instances right, not EC2?

Nitro is when you get the whole bare metal server and you need to run your own Hypervisor/ OS (which is why they mentioned VMware). This instance hasn't been available to the public very long (like a year or two). Maybe I am missing something here but I think a lot of comments seem to be mis understanding what this is


Disclaimer: I work at AWS, primarily in the compute space, but I'm not speaking in an official capacity.

As fguerraz mentioned, modern AWS instance families are basically all powered by Nitro, which refers to the ecosystem around the hypervisor and hardware acceleration cards utilized. https://aws.amazon.com/ec2/nitro/


Thank you very much! I thought nitro referred to the bare metal offering only.


All modern instances families (regardless of their type) are powered by Nitro.


What's the difference between this and the SSM agent? Can I replace SSM agent configs for this?


Disclaimer: I work at AWS, primarily in the compute space, but I'm not speaking in an official capacity.

I would say they fulfill different purposes. The SSM agent has quite a bit of additional functionality, even within the Session Manager portion. It's more of your solution for online, general day to day access.

Serial console will let you fix issues when you have lost the ability to boot an instance, or network connectivity has failed. When SSH or Session Manager are available, I personally would opt to utilize them over the serial console. But if I have an instance that I can't reach via those, am unable to replace it for whatever reason, and need to bring it back online, serial console would be what I would reach for.


Oh good, I've always wanted to be able to raise the elephants in the cloud...


It's about time. Wow.


This is amazing.


Cloud's catching up. :tada:


"Accessing your serial console is only 0.04$/per character, except if you bid on peak console then ..."

I made that up, but it would totally not surprise anyone, would it?


I did certifications and couldn't believe that it wasn't $5/missed question the first time around :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: