USB Firmware Bug

It seems like the firmware on my Shapeoko 4 glitches out and can cause a crash if I open and then immediately close the serial port on either a Linux or macOS machine. The same behavior does not occur on Windows.

Here’s the code:

int main()
{
	int fd = open("/dev/tty.usbmodem1101", O_RDWR | O_NOCTTY | O_NONBLOCK);
	if (fd == -1)
	{
		perror("open_port: Unable to open /dev/tty.usbmodem1101 - ");
		return -1;
	}

	// sleep for 1 second
	//sleep(1);

	return 0;

Without the sleep, as soon as this program runs the Shapeoko will immediately glitch out and begin moving aggressively in +Y and +Z directions. It will keep doing that and crash (if I let it) until I kill power to the Shapeoko. Disconnecting the USB connection does not help.

With the 1 second sleep, it works fine.

I tested using a MacBook, a Raspberry Pi, and a Microsoft Surface (with Windows specific code to open the port). Only the Microsoft Surface did not induce this behavior.

This seems like a firmware bug, as it’s rather dangerous behavior. I’m guessing something is wrong in the ATmega16U2, which IIUC runs the USB->Serial code. My guess is it starts some kind of setup code when the serial port gets opened, but gets interrupted if the port closes too quickly, leaving itself in an undefined state. With the sleep it has time to finish, so the bug doesn’t occur. Not sure why Windows doesn’t induce the bug though. Maybe Windows takes longer opening the port than *nix?

AFAIK the code running on that interface chip isn’t available anywhere, right? So not much more debugging I can do on my end for now.

1 Like

Quick update.

I hooked up a serial monitor the UART pins on the Motion board, to see what, if any, activity it be happening when this bug occurs.

Indeed, when the bug gets triggered, I see “8” spammed on the UART line. Not sure what Grbl would interpret that as, but it’s definitely not what the Interface firmware should be doing :stuck_out_tongue:

I think that 16U2 chip runs a pure vanilla arduino-usbserial firmware, so you could deep dive into that source code if you fancied it.

Once upon a time I messed around with LUFA 100807 release that has the arduino-usbserial code, and was able to rebuild it and flash it to an Arduino Uno. Not sure about flashing to a Shapeoko controller (never had a need to so I never did)

It still sounds very strange to me that even with garbage data coming in Grbl at power-up the machine would move at all. Grbl normally won’t allow anything to move until you home the machine. Unless of course you are using your own version of Grbl on the Shapeoko where you disabled homing, and then…

Also, shouldn’t you explicitly call close(fd) in that example ?

5 Likes

Good idea. I’ll grab an Uno board and see if I can replicate on there.

Unless of course you are using your own version of Grbl on the Shapeoko where you disabled homing, and then…

Nope, all stock firmware.

It’s definitely odd. Plus, “888888888888”, etc isn’t even valid gcode to grbl, as far as I can tell. Hopefully I can load Grbl onto the Uno and test in isolation there.

Also, shouldn’t you explicitly call close(fd) in that example ?

The bug occurs either way (the kernel is going to call close for us).

1 Like

Got a video?
I’m with @Julien on this…
I can’t think of any non-valid-command situation that would cause/allow motion. Are you capturing any of the data sent?
There’s nothing on the 16U2 that is Grbl specific, but if you replicate on an UNO, what about on a clone with a different USB serial IC?

Got a video?

Sure: Dropbox - IMG_0786.mov - Simplify your life

Including audio, so you can hear the motors “grinding” during the glitch instead of their usual motion whine.

Are you capturing any of the data sent?

Yup, I documented that in a comment above. I hooked up to the UART header on the Carbide Motion board and during the glitch all I received was “8” repeated seemingly infinitely (until poweroff).

I’ll grab an Uno board and see if I can replicate on there.

So I tried to replicate on an Uno Rev3, which uses the same Atmel chips as the Carbide Motion board, but the bug does not occur. I dumped the Uno’s firmware and verified that it’s using stock usb-serial firmware (ArduinoCore-avr/Arduino-usbserial-atmega16u2-Uno-Rev3.hex at master · arduino/ArduinoCore-avr · GitHub).

So my next step will be to dump the Carbide Motion’s 16U2 firmware and see what it’s running. Looks like ISP1 is for the 16U2, so fingers crossed.

2 Likes

Really weird. Is it repeatedly connecting and disconnecting?

The PC? No, I just run the program once. Opens the serial port and then exits (which defacto closes the port).

I’ve dumped the 16U2 on my Carbide Motion board. It’s not running vanilla usbserial firmware, which I expected since the VID:PID are different. But as to how much of the code is actually different, we’ll see. Poking at it now.

Do you have any startup blocks set?

Send $N and Grbl will report back.

1 Like
Grbl 1.1f ['$' for help]
[MSG:'$H'|'$X' to unlock]
$N0=
$N1=
ok
1 Like

I’ve got some more … interesting … data on this bug.

First off, I was not able to reproduce the bug today using my RasPi, even though it triggered from there before (I haven’t played with this bug in two weeks; been busy actually using the Shapeoko :stuck_out_tongue: ).

But I am able to still trigger it with my MacBook. Not only that, but after getting the machine to trigger with the MacBook, I tried the Pi again and while the motors didn’t glitch out, the behavior of the RXI line changed compared to before using the MacBook. About 7 random bytes would spew across RXI before settling to a high state. Incredibly odd.

While connected to my MacBook and repeatedly triggering the bug with the motors disconnected, I scoped some signals on the Motion board. Purple is always DTR.

Yellow is TXO

Same as above, zoomed in

Yellow is RESET (from GRBL chip)

Yellow is RXI

When not bugging, normally TXO and RXI are silent. RESET behaves the same in both cases. The fact that it spikes up above 5V is a bit disconcerting, though.

Again, in none of these experiments do I send any serial data over the USB connection. Just open the serial port and immediately close it. The Shapeoko is power cycled between each test.

2 Likes

With some more testing I’ve discovered that the bug depends on how long DTR is held low.

The amount of time DTR is held low is related to how long the test program sleeps. In the case of no sleep, just an open and close, DTR is held low for just 2ms on my RasPi. The bug does not trigger under these conditions.

But by adding a usleep I can vary the amount of time DTR is held low and trigger the bug under certain conditions using the RasPi. Here are my findings:

  • 1ms sleep => No bug
  • 2ms sleep => Bug
  • 10ms sleep => Bug
  • 30ms sleep => Bug
  • 50ms sleep => Bug
  • 100ms sleep => No bug
  • 200ms sleep => No bug
  • 300ms sleep => No bug

N.B. ~2ms is added to total DTR low time due to the time taken to open/close

In all cases tested I scoped both DTR and RESET (from GRBL’s chip). In all cases RESET behaves as depicted in my previous posted scope images. DTR going low causes RESET to immediately drop low to 0V, and then decay back to 5V. DTR going back to 5V causes RESET to spike up to 10V and then decay back to 5V. Decay time is 5ms.

There’s usually an RC circuit on RESET lines, so the decay is normal. 5ms matches the Uno schematic RC filter.

I also scoped by Uno, which is loaded with copies of the firmwares from the Motion board. I’ve never been able to trigger the bug on my Uno. And even varying DTR I’m still not able to. And while it shows a spike on RESET when DTR rises, it’s only up to 6V. That matches a quick simulation on falstad, so I guess that’s normal.

I’m a bit concerned that the Motion board’s RESET is spiking to 10V, though. The 328P can tolerate up to 13V on that pin, but still. Makes me wonder if that’s what is making it glitch out?

3 Likes

So, this is obviously an edge case that most users wouldn’t encounter, but maybe something for @Jorge to look in to.
Curious… what made you run the initial program that showed the behavior?
So the 328 it’s resetting repeatedly very fast?

I was making some controller software for fun. Step one, connect to the serial port. Step two… discover doing that made the Shapeoko glitch out and crash itself :stuck_out_tongue:

2 Likes

Wow! Thanks for putting the work in to discover and document this!! I admit I don’t know* how serious this issue may be, but it is great to see that those that have the special talent to help actually do help naturally…

*(I can only interpret the facial expressions of the actors in this play, as they are speaking in a language that is foreign to me, but I feel I can grasp the message well enough, well done!)